Motion candidate derivation based on spatial neighboring block in sub-block motion vector prediction

ABSTRACT

Devices, systems and methods for the simplification of sub-block motion candidate lists for video coding are described. In a representative aspect, a method for video processing includes determining, during a conversion between a current block and a bitstream representation of the current block, a temporal motion vector prediction candidate for a sub-block of the current block. The temporal motion vector prediction candidate is completely determined based on K neighboring blocks of the current block, K being a positive integer. The method also includes performing the conversion based on the temporal motion vector prediction candidate for the sub-block.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No.PCT/IB2019/059109, filed on Oct. 24, 2019, which claims the priority toand benefit of International Patent Application No. PCT/CN2018/111587,filed on Oct. 24, 2018 and International Patent Application No.PCT/CN2018/124984, filed on Dec. 28, 2018. All the aforementioned patentapplications are hereby incorporated by reference in their entireties.

TECHNICAL FIELD

This patent document is directed generally to image and video codingtechnologies.

BACKGROUND

In spite of the advances in video compression, digital video stillaccounts for the largest bandwidth use on the internet and other digitalcommunication networks. As the number of connected user devices capableof receiving and displaying video increases, it is expected that thebandwidth demand for digital video usage will continue to grow.

SUMMARY

Devices, systems and methods related to digital video coding, andspecifically, to simplifying sub-block motion candidate lists for videocoding are described. The described methods may be applied to both theexisting video coding standards (e.g., High Efficiency Video Coding(HEVC)) and future video coding standards or video codecs.

In one representative aspect, the disclosed technology can be used toprovide a method for video processing. The method includes determining,during a conversion between a current block of visual media data and abitstream representation of the current block, a temporal motion vectorprediction candidate for at least a sub-block of the current block andperforming the conversion based on the temporal motion vector predictioncandidate for the sub-block. The temporal motion vector predictioncandidate is determined based on K neighboring blocks of the currentblock, K being a positive integer.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includesdetermining, during a conversion between a current block of a video anda bitstream representation of the video, a temporal motion vectorprediction candidate based on a temporal neighboring block of thecurrent block. The temporal neighboring block is identified based onmotion information of a spatial neighboring block selected from one ormore spatial neighboring blocks that are different from at least onespatial neighboring block used in a merge list construction process of avideo block. The method also includes performing the conversion based onthe temporal motion vector prediction candidate.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includesmaintaining, for a conversion between a current block of a video and abitstream representation of the video, a table of motion candidatesbased on past conversions of the video and the bitstream representation;deriving a temporal motion vector prediction candidate based on thetable of motion candidates; and performing the conversion based on thetemporal motion vector prediction candidate.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includesdetermining, for a conversion between a current block of a video and abitstream representation of the video, one or more temporal motionvector prediction candidates for the current block and performing theconversion based on the one or more temporal motion vector predictioncandidates. The one or more temporal motion vector prediction candidatescan be determined by identifying a first temporal adjacent block of thecurrent block based on an initial motion vector, wherein the firsttemporal adjacent block includes invalid motion information, andexamining additional temporal adjacent blocks to obtain the one or moretemporal motion vector prediction candidates.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includesdetermining, for a conversion between a current block of a video and abitstream representation of the video, one or more temporal motionvector prediction candidates for the current block. The one or moretemporal motion vector prediction candidates comprise a default temporalmotion vector prediction candidate. The method also includes performingthe conversion based on the one or more temporal motion vectorprediction candidates.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includesdetermining, for a conversion between a current block of a video and abitstream representation of the video, a sub-block level merge candidatelist that includes at least one sub-block coding type. The method alsoincludes performing the conversion based on the sub-block level mergecandidate list.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing that includes determining, fora conversion between a current block of a video and a bitstreamrepresentation of the video, a sub-block level coding technique based onan indication that is signaled in a picture header, a picture parameterset (PPS), a slice header, or a tile group header. The method alsoincludes performing the conversion based on the sub-block level codingtechnique.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing that includes determining, fora conversion between a current block of a video and a bitstreamrepresentation of the video, a sub-block level temporal motion candidateusing a derivation process applicable to a block level temporal motionvector prediction candidate conversion between the current block and thebitstream representation, and performing the conversion based on thesub-block level temporal motion candidate.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing that includes determining, fora conversion between a current block of a video and a bitstreamrepresentation of the video, a block level temporal motion vectorprediction candidate using a derivation process applicable to asub-block level temporal motion candidate conversion between the currentblock and the bitstream representation, and performing the conversionbased on the block level temporal motion vector prediction candidate.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includesselecting, for sub-block level processing of a current video block,motion information associated with a spatial neighboring block,deriving, based on the motion information, a motion vector predictioncandidate, adding the motion vector prediction candidate to a sub-blockbased merge list that is different from a merge list, where thesub-block based merge list excludes block-level prediction candidates,and reconstructing the current video block or decoding other videoblocks based on the motion vector prediction candidate.

In another representative aspect, the disclosed technology may be usedto provide a method for video processing. This method includes deriving,for sub-block level processing of a current video block, a motion vectorprediction candidate, assigning a merge index to a type of the motionvector prediction candidate, and adding the motion vector predictioncandidate and the merge index to a sub-block based merge list that isdifferent from a merge list, where the sub-block based merge listexcludes block-level prediction candidates.

In yet another representative aspect, the disclosed technology may beused to provide a method for video processing. This method includesderiving, for sub-block level processing of a current video block, amotion vector prediction candidate, and adding, based on an adaptiveordering, the motion vector prediction candidate to a sub-block basedmerge list that is different from a merge list, where the sub-blockbased merge list excludes block-level prediction candidates.

In another example aspect, a method of video processing is disclosed.The method includes determining a default motion candidate for asub-block based coding mode for a conversion between a current videoblock and a bitstream representation of the current video block usingone of the following: (a) a uni-prediction candidate that is derived byscaling a starting motion candidate to a reference picture index withina reference picture list X; or (b) a bi-prediction candidate that isderived by scaling to reference picture indexes within two referencepicture lists; or (c) candidate in either (a) or (b) depending on apicture type or a slice type of the current video block; or (d) acandidate derived for a temporal motion vector predictor (TMVP) processof the current video block.

In yet another representative aspect, the above-described method isembodied in the form of processor-executable code and stored in acomputer-readable program medium.

In yet another representative aspect, a device that is configured oroperable to perform the above-described method is disclosed. The devicemay include a processor that is programmed to implement this method.

In yet another representative aspect, a video decoder apparatus mayimplement a method as described herein.

The above and other aspects and features of the disclosed technology aredescribed in greater detail in the drawings, the description and theclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example of sub-block based prediction.

FIGS. 2A and 2B show examples of the simplified 4-parameter affine modeland the simplified 6-parameter affine model, respectively.

FIG. 3 shows an example of an affine motion vector field (MVF) persub-block.

FIGS. 4A and 4B show example candidates for the AF_MERGE affine motionmode.

FIG. 5 shows an example of candidate positions for affine merge mode.

FIG. 6 shows another example of candidate positions for affine mergemode.

FIG. 7 shows an example of one coding unit (CU) with sub-blocks andneighboring blocks of the CU.

FIG. 8 shows yet another example of candidate positions for affine mergemode.

FIG. 9 shows an example of spatial neighboring blocks using foralternative temporal motion vector prediction (ATMVP) temporal blockidentification.

FIG. 10 shows an example of identifying an alternative starting pointfor ATMVP.

FIG. 11 shows a flowchart of an example method for video coding inaccordance with the disclosed technology.

FIG. 12 shows a flowchart of another example method for video coding inaccordance with the disclosed technology.

FIG. 13 shows a flowchart of yet another example method for video codingin accordance with the disclosed technology.

FIG. 14 is a block diagram of an example of a hardware platform forimplementing a visual media decoding or a visual media encodingtechnique described in the present document.

FIG. 15 shows an example of how to identify the represented block fordefault motion derivation.

FIG. 16 is a block diagram of an example video processing system inwhich disclosed techniques may be implemented.

FIG. 17 is a flowchart representation of a method for video processingin accordance with the present disclosure.

FIG. 18 is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 19 is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 20 is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 21 is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 22 is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 23 is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 24A is a flowchart representation of another method for videoprocessing in accordance with the present disclosure.

FIG. 24B is a flowchart representation of yet another method for videoprocessing in accordance with the present disclosure.

DETAILED DESCRIPTION

Due to the increasing demand of higher resolution video, video codingmethods and techniques are ubiquitous in modern technology. Video codecstypically include an electronic circuit or software that compresses ordecompresses digital video, and are continually being improved toprovide higher coding efficiency. A video codec converts uncompressedvideo to a compressed format or vice versa. There are complexrelationships between the video quality, the amount of data used torepresent the video (determined by the bit rate), the complexity of theencoding and decoding algorithms, sensitivity to data losses and errors,ease of editing, random access, and end-to-end delay (latency). Thecompressed format usually conforms to a standard video compressionspecification, e.g., the High Efficiency Video Coding (HEVC) standard(also known as H.265 or MPEG-H Part 2), the Versatile Video Coding (VVC)standard to be finalized, or other current and/or future video codingstandards.

Sub-block based prediction is first introduced into the video codingstandard by the High Efficiency Video Coding (HEVC) standard. Withsub-block based prediction, a block, such as a Coding Unit (CU) or aPrediction Unit (PU), is divided into several non-overlapped sub-blocks.Different sub-blocks may be assigned different motion information, suchas reference index or motion vector (MV), and motion compensation (MC)is performed individually for each sub-block. FIG. 1 shows an example ofsub-block based prediction.

Embodiments of the disclosed technology may be applied to existing videocoding standards (e.g., HEVC, H.265) and future standards to reducehardware implementation complexity or improve coding performance.Section headings are used in the present document to improve readabilityof the description and do not in any way limit the discussion or theembodiments (and/or implementations) to the respective sections only.

1. Examples of the Joint Exploration Model (JEM)

In some embodiments, future video coding technologies are explored usinga reference software known as the Joint Exploration Model (JEM). In JEM,sub-block based prediction is adopted in several coding tools, such asaffine prediction, alternative temporal motion vector prediction(ATMVP), spatial-temporal motion vector prediction (STMVP),bi-directional optical flow (BIO), Frame-Rate Up Conversion (FRUC).Affine prediction has also been adopted into VVC.

1.1 Examples of Affine Prediction

In HEVC, only a translation motion model is applied for motioncompensation prediction (MCP). While in the real world, there are manykinds of motion, e.g. zoom in/out, rotation, perspective motions and theother irregular motions. In the VVC, a simplified affine transformmotion compensation prediction is applied. As shown in FIGS. 2A and 2B,the affine motion field of the block is described by two (in the4-parameter affine model that uses the variables a, b, e and f) or three(in the 6-parameter affine model that uses the variables a, b, c, d, eand f) control point motion vectors, respectively.

The motion vector field (MVF) of a block is described by the followingequation with the 4-parameter affine model and 6-parameter affine modelrespectively:

$\begin{matrix}\left\{ \begin{matrix}\begin{matrix}\begin{matrix}{{{mv}^{h}\left( {x,y} \right)} = {{{ax} - {by} + e} = {{\frac{\left( {{mv}_{1}^{h} - {mv}_{0}^{h}} \right)}{w}x} -}}} \\{{\frac{\left( {{mv}_{1}^{v} - {mv}_{0}^{v}} \right)}{w}y} + {mv}_{0}^{h}}\end{matrix} \\{{{mv}^{v}\left( {x,y} \right)} = {{{bx} + {ay} + f} = {{\frac{\left( {{mv}_{1}^{v} - {mv}_{0}^{v}} \right)}{w}x} +}}}\end{matrix} \\{{\frac{\left( {{mv}_{1}^{h} - {mv}_{0}^{h}} \right)}{w}y} + {mv}_{0}^{v}}\end{matrix} \right. & {{Eq}.\mspace{11mu} (1)} \\\left\{ \begin{matrix}\begin{matrix}\begin{matrix}{{{mv}^{h}\left( {x,y} \right)} = {{{ax} + {cy} + e} = {{\frac{\left( {{mv}_{1}^{h} - {mv}_{0}^{h}} \right)}{w}x} +}}} \\{{\frac{\left( {{mv}_{2}^{h} - {mv}_{0}^{h}} \right)}{h}y} + {mv}_{0}^{h}}\end{matrix} \\{{{mv}^{v}\left( {x,y} \right)} = {{{bx} + {dy} + f} = {{\frac{\left( {{mv}_{1}^{v} - {mv}_{0}^{v}} \right)}{w}x} +}}}\end{matrix} \\{{\frac{\left( {{mv}_{2}^{v} - {mv}_{0}^{v}} \right)}{h}y} + {mv}_{0}^{v}}\end{matrix} \right. & {{Eq}.\mspace{11mu} (2)}\end{matrix}$

Herein, (mv^(h) ₀, mv^(h) ₀) is motion vector of the top-left cornercontrol point (CP), and (mv^(h) ₁, mv^(h) ₁) is motion vector of thetop-right corner control point and (mv^(h) ₂, mv^(h) ₂) is motion vectorof the bottom-left corner control point, (x, y) represents thecoordinate of a representative point relative to the top-left samplewithin current block. The CP motion vectors may be signaled (like in theaffine AMVP mode) or derived on-the-fly (like in the affine merge mode).w and h are the width and height of the current block. In practice, thedivision is implemented by right-shift with a rounding operation. InVTM, the representative point is defined to be the center position of asub-block, e.g., when the coordinate of the left-top corner of asub-block relative to the top-left sample within current block is (xs,ys), the coordinate of the representative point is defined to be (xs+2,ys+2).

In a division-free design, Equations (1) and (2) are implemented as:

$\begin{matrix}\left\{ \begin{matrix}\left. {{iDMvHorX} = {{\left( {{mv}_{1}^{h} - m_{0}^{h}} \right)\mspace{11mu} {\operatorname{<<}\left( S \right.}}\; - {\log \; 2(w)}}} \right) \\\left. {{iDMvHorY} = {{\left( {{mv}_{1}^{v} - m_{0}^{v}} \right)\mspace{11mu} {\operatorname{<<}\left( S \right.}}\; - {\log \; 2(w)}}} \right)\end{matrix} \right. & {{Eq}.\mspace{11mu} (3)}\end{matrix}$

For the 4-parameter affine model shown in Equation (1):

$\begin{matrix}\left\{ \begin{matrix}{{iDMvVerX} = {- {iDMvHorY}}} \\{{iDMvVerY} = {- {iDMvHorX}}}\end{matrix} \right. & {{Eq}.\mspace{11mu} (4)}\end{matrix}$

For the 6-parameter affine model shown in Equation (2):

$\begin{matrix}\left\{ \begin{matrix}\left. {{iDMvVerX} = {{\left( {{mv}_{2}^{h} - m_{0}^{h}} \right)\mspace{11mu} {\operatorname{<<}\left( S \right.}}\; - {\log \; 2(h)}}} \right) \\\left. {{iDMvVerY} = {{\left( {{mv}_{2}^{v} - m_{0}^{v}} \right)\mspace{11mu} {\operatorname{<<}\left( S \right.}}\; - {\log \; 2(h)}}} \right)\end{matrix} \right. & {{Eq}.\mspace{11mu} (5)}\end{matrix}$

And thus, the motion vectors may be derived as:

$\begin{matrix}\left\{ \begin{matrix}{{{mv}^{h}\left( {x,y} \right)} = {{Normalize}\left( {{{iDMvHorX} \cdot x} +} \right.}} \\\left. {{{{iDMvVerX} \cdot y} + \left( {{mv}_{0}^{h}\mspace{11mu} {\operatorname{<<}S}} \right)},S} \right) \\{{{mv}^{v}\left( {x,y} \right)} = {{Normalize}\left( {{{iDMvHorY} \cdot x} +} \right.}} \\\left. {{{{iDMvVerY} \cdot y} + \left( {{mv}_{0}^{v}\mspace{11mu} {\operatorname{<<}S}} \right)},S} \right)\end{matrix} \right. & {{Eq}.\mspace{11mu} (6)} \\{{{Normalize}\mspace{11mu} \left( {Z,S} \right)} = \left\{ {{\begin{matrix}{\left( {Z + {Off}} \right)\operatorname{>>}S} & {{{if}\mspace{14mu} Z} \geq 0} \\{- \left( {\left( {{- Z} + {Off}} \right)\operatorname{>>}S} \right)} & {Otherwise}\end{matrix}{Off}} = {{1{\operatorname{<<}\; \left( S \right.}}\; - 1}} \right)} & {{Eq}.\mspace{11mu} (7)}\end{matrix}$

Herein, S represents the calculation precision. e.g. in VVC, S=7. InVVC, the MV used in MC for a sub-block with the top-left sample at (xs,ys) is calculated by Equation (6) with x=xs+2 and y=ys+2.

To derive motion vector of each 4×4 sub-block, the motion vector of thecenter sample of each sub-block, as shown in FIG. 3, is calculatedaccording to Equations (1) or (2), and rounded to 1/16 fractionaccuracy. Then the motion compensation interpolation filters are appliedto generate the prediction of each sub-block with derived motion vector.

Affine model can be inherited from spatial neighbouring affine-codedblock such as left, above, above right, left bottom and above leftneighbouring block as shown in FIG. 4A. For example, if the neighbourleft bottom block A in FIG. 4A is coded in affine mode as denoted by A0in FIG. 4B, the Control Point (CP) motion vectors mv₀ ^(N), mv₁ ^(N) andmv₂ ^(N) of the top left corner, above right corner and left bottomcorner of the neighbouring CU/PU which contains the block A are fetched.And the motion vector mv₀ ^(C), mv₁ ^(C) and mv₂ ^(C) (which is onlyused for the 6-parameter affine model) of the top left corner/topright/bottom left on the current CU/PU is calculated based on mv₀ ^(N),mv₁ ^(N) and mv₂ ^(N).

In some embodiments (e.g., VTM-2.0), sub-block (e.g. 4×4 block in VTM)LT stores mv0, RT stores mv1 if the current block is affine coded. Ifthe current block is coded with the 6-parameter affine model, LB storesmv2; otherwise (with the 4-parameter affine model), LB stores mv2′.Other sub-blocks stores the MVs used for MC.

In some embodiments, when a CU is coded with affine merge mode, e.g., inAF_MERGE mode, it gets the first block coded with affine mode from thevalid neighbour reconstructed blocks. And the selection order for thecandidate block is from left, above, above right, left bottom to aboveleft as shown in FIG. 4A.

The derived CP MVs mv₀ ^(C), mv₁ ^(C) and mv₂ ^(C) of current block canbe used as CP MVs in the affine merge mode. Or they can be used as MVPfor affine inter mode in VVC. It should be noted that for the mergemode, if the current block is coded with affine mode, after deriving CPMVs of current block, the current block may be further split intomultiple sub-blocks and each block derives its motion information basedon the derived CP MVs of current block.

2. Example Embodiments

Different from VTM wherein only one affine spatial neighboring block maybe used to derive affine motion for a block, a separate list of affinecandidates is constructed for the AF_MERGE mode.

(1) Insert Inherited Affine Candidates into Candidate List

In an example, inherited affine candidate means that the candidate isderived from the valid neighbor reconstructed block coded with affinemode.

As shown in FIG. 5, the scan order for the candidate block is A₁, B₁,B₀, A₀ and B₂. When a block is selected (e.g., A₁), the two-stepprocedure is applied:

(a) Firstly, use the three corner motion vectors of the CU covering theblock to derive two/three control points of current block; and

(b) Based on the control points of current block to derive sub-blockmotion for each sub-block within current block.

(2) Insert Constructed Affine Candidates

In some embodiments, if the number of candidates in affine mergecandidate list is less than MaxNumAffineCand, constructed affinecandidates are insert into the candidate list.

Constructed affine candidate means the candidate is constructed bycombining the neighbor motion information of each control point.

The motion information for the control points is derived firstly fromthe specified spatial neighbors and temporal neighbor shown in FIG. 5.CPk (k=1, 2, 3, 4) represents the k-th control point. A₀, A₁, A₂, B₀,B₁, B₂ and B₃ are spatial positions for predicting CPk (k=1, 2, 3); T istemporal position for predicting CP4.

The coordinates of CP1, CP2, CP3 and CP4 is (0, 0), (W, 0), (H, 0) and(W, H), respectively, where W and H are the width and height of currentblock.

The motion information of each control point is obtained according tothe following priority order:

-   -   For CP1, the checking priority is B₂→B₃→A₂. B₂ is used if it is        available. Otherwise, if B₂ is available, B₃ is used. If both B₂        and B₃ are unavailable, A₂ is used. If all the three candidates        are unavailable, the motion information of CP1 cannot be        obtained.    -   For CP2, the checking priority is B1→B0;    -   For CP3, the checking priority is A1→A0;    -   For CP4, T is used.

Secondly, the combinations of controls points are used to construct themotion model.

Motion vectors of three control points are needed to compute thetransform parameters in 6-parameter affine model. The three controlpoints can be selected from one of the following four combinations({CP1, CP2, CP4}, {CP1, CP2, CP3}, {CP2, CP3, CP4}, {CP1, CP3, CP4}).For example, use CP1, CP2 and CP3 control points to construct6-parameter affine motion model, denoted as Affine (CP1, CP2, CP3).

Motion vectors of two control points are needed to compute the transformparameters in 4-parameter affine model. The two control points can beselected from one of the following six combinations ({CP1, CP4}, {CP2,CP3}, {CP1, CP2}, {CP2, CP4}, {CP1, CP3}, {CP3, CP4}). For example, usethe CP1 and CP2 control points to construct 4-parameter affine motionmodel, denoted as Affine (CP1, CP2).

The combinations of constructed affine candidates are inserted into tocandidate list as following order:

-   -   {CP1, CP2, CP3}, {CP1, CP2, CP4}, {CP1, CP3, CP4}, {CP2, CP3,        CP4}, {CP1, CP2}, {CP1, CP3}, {CP2, CP3}, {CP1, CP4}, {CP2,        CP4}, {CP3, CP4}

(3) Insert Zero Motion Vectors

If the number of candidates in affine merge candidate list is less thanMaxNumAffineCand, zero motion vectors are insert into the candidatelist, until the list is full.

3. Examples of Advanced Temporal Motion Vector Prediction (ATMVP)

In some existing implementations, advanced temporal motion vectorprediction (ATMVP) was included in the benchmark set (BMS)-1.0 referencesoftware, which derives multiple motion for sub-blocks of one codingunit (CU) based on the motion information of the collocated blocks fromtemporal neighboring pictures. Although it improves the efficiency oftemporal motion vector prediction, the following complexity issues areidentified for the existing ATMVP design:

-   -   The collocated pictures of different ATMVP CUs may not be the        same if multiple reference pictures are used. This means the        motion fields of multiple reference pictures need to be fetched.    -   The motion information of each ATMVP CU is always derived based        on 4×4 units, resulting in multiple invocations of motion        derivation and motion compensation for each 4×4 sub-block inside        one ATMVP CU.

3.1 Examples of Simplified Collocated Block Derivation with One FixedCollocated Picture

In this example method, one simplified design is described to use thesame collocated picture as in HEVC, which is signaled at the sliceheader, as the collocated picture for ATMVP derivation. At the blocklevel, if the reference picture of a neighboring block is different fromthis collocated picture, the MV of the block is scaled using the HEVCtemporal MV scaling method, and the scaled MV is used in ATMVP.

Denote the motion vector used to fetch the motion field in thecollocated picture R_(col) as MV_(col). To minimize the impact due to MVscaling, the MV in the spatial candidate list used to derive MV_(col) isselected in the following way: if the reference picture of a candidateMV is the collocated picture, this MV is selected and used as MV_(col)without any scaling. Otherwise, the MV having a reference pictureclosest to the collocated picture is selected to derive MV_(col) withscaling.

3.2 Examples of Adaptive ATMVP Sub-Block Size

In this example method, the slice-level adaptation of the sub-block sizeis supported for ATMVP motion derivation. In some cases, the ATMVP isalso known as sub-block temporal motion vector prediction (sbTMVP).Specifically, one default sub-block size that is used for the ATMVPmotion derivation is signaled at sequence level. Additionally, one flagis signaled at slice-level to indicate if the default sub-block size isused for the current slice. If the flag is false, the correspondingATMVP sub-block size is further signaled in the slice header for theslice.

3.3 Examples of a Simplified ATMVP Derivation

In some embodiments, ATMVP predicts the motion vectors of the sub-CUswithin a CU in two steps. The first step is to identify thecorresponding block in the collocated picture signaled at the sliceheader. The second step is to split the current CU into sub-CUs andobtain the motion information of each sub-CU from the blockcorresponding to each sub-CU in the collocated picture.

In the first step, the collocated block is identified by always scanningthe MVs of the spatial merge candidates twice (once for each list). Theconstruction of merge candidates list is performed by checkingA₁→B₁→B₀→A₀→ATMVP→B₂→TMVP, as shown in FIG. 6. Therefore, the number ofMVP candidates in the merge list is up to 4 before ATMVP, which meansthat in the worst case, the scanning process in the first step needs tocheck all the 4 candidate blocks for each list.

To simplify the neighboring blocks' scanning process, the methodrestricts the number of scanning process for deriving the collocatedblock to one time, which means that only the first available candidatein the merge list is checked. If the candidate doesn't satisfy thecondition of ATMVP neighboring blocks scanning in current VVC workingdraft (none of motion vectors associated with list 0 and list 1 ispointing to the collocated picture), zero motion vector will be used toderive the collocated block in the collocated picture. In this method,the checking process is performed up to 1 time. Such a motion vector(e.g., in current design, it could be motion associated with one spatialneighboring block, or zero motion vector) is called a starting point MVfor ATMVP.

3.4 Derivation of Sub-Blocks' Motion Information

Two steps are performed in order to fill in all motion information ofdifferent sub-blocks.

1. Find default motion information:

-   -   1. Identify a block based on the center position within the        current block and the starting point MV in the collocated        picture (i.e., a block covering (x0+W/2+(SPMV_X>>K),        y0+H/2+(SPMV_Y>>K)) wherein (x0, y0) is the top-left sample's        coordinate, (W, H) is the block's width and height,        respectively; (SPMV_X, SPMV_Y) are the starting point MV and K        represents the motion vector's precision, (SPMV_X>>K, SPMV_Y>>K)        denotes the integer MV).    -   2. If the identified block is intra coded, ATMVP process is        terminated and ATMVP candidate is set to unavailable.    -   3. Otherwise (the identified block is inter coded), motion        information of the identified block may be utilized to derive        default motion information (e.g., scaled to certain reference        pictures). The default motion information could be either        uni-prediction or bi-prediction depending on the reference        pictures.

FIG. 15 shows an example of how to identify the represented block fordefault motion derivation. The block covering the position (filledcircle) in the collocated picture is the represented block for defaultmotion derivation.

2. If default motion is found, for each sub-block, based on its centerposition within the sub-block and the starting point MV to locate arepresentative block in the collocated picture.

-   -   1. If the representative block is coded as inter-mode, the        motion information of the representative block is utilized to        derive the final sub-block's motion (i.e., scaled to certain        reference pictures).    -   2. If the representative block is coded as intra-mode, the        sub-block's motion is set to the default motion information.

4. Examples of Spatial-Temporal Motion Vector Prediction (STMVP)

In the STMVP method, the motion vectors of the sub-CUs are derivedrecursively, following raster scan order. FIG. 7 shows an example of oneCU with four sub-blocks and neighboring blocks. Consider an 8×8 CU whichcontains four 4×4 sub-CUs A, B, C, and D. The neighbouring 4×4 blocks inthe current frame are labelled as a, b, c, and d.

The motion derivation for sub-CU A starts by identifying its two spatialneighbours. The first neighbour is the N×N block above sub-CU A (blockc). If this block c is not available or is intra coded the other N×Nblocks above sub-CU A are checked (from left to right, starting at blockc). The second neighbour is a block to the left of the sub-CU A (blockb). If block b is not available or is intra coded other blocks to theleft of sub-CU A are checked (from top to bottom, staring at block b).The motion information obtained from the neighbouring blocks for eachlist is scaled to the first reference frame for a given list. Next,temporal motion vector predictor (TMVP) of sub-block A is derived byfollowing the same procedure of TMVP derivation as specified in HEVC.The motion information of the collocated block at location D is fetchedand scaled accordingly. Finally, after retrieving and scaling the motioninformation, all available motion vectors (up to 3) are averagedseparately for each reference list. The averaged motion vector isassigned as the motion vector of the current sub-CU.

5. Example Embodiments of Affine Merge Candidate Lists 5.1 ExampleEmbodiments

In the affine merge mode of VTM-2.0.1, only the first available affineneighbour can be used to derive motion information of affine merge mode.In some embodiments, a candidate list for affine merge mode isconstructed by searching valid affine neighbours and combining theneighbor motion information of each control point.

The affine merge candidate list is constructed as following steps:

(1) Insert Inherited Affine Candidates

Inherited affine candidate means that the candidate is derived from theaffine motion model of its valid neighbor affine coded block. In thecommon base, as shown in FIG. 8, the scan order for the candidatepositions is: A1, B1, B0, A0 and B2.

After a candidate is derived, full pruning process is performed to checkwhether same candidate has been inserted into the list. If a samecandidate exists, the derived candidate is discarded.

(2) Insert Constructed Affine Candidates

If the number of candidates in affine merge candidate list is less thanMaxNumAffineCand (set to 5 in this example), constructed affinecandidates are inserted into the candidate list. Constructed affinecandidate means the candidate is constructed by combining the neighbormotion information of each control point.

The motion information for the control points is derived firstly fromthe specified spatial neighbors and temporal neighbor shown in FIG. 8.CPk (k=1, 2, 3, 4) represents the k-th control point. A0, A1, A2, B0,B1, B2 and B3 are spatial positions for predicting CPk (k=1, 2, 3); T istemporal position for predicting CP4.

The coordinates of CP1, CP2, CP3 and CP4 is (0, 0), (W, 0), (H, 0) and(W, H), respectively, where W and H are the width and height of currentblock.

The motion information of each control point is obtained according tothe following priority order:

-   -   For CP1, the checking priority is B₂→B₃→A₂. B₂ is used if it is        available. Otherwise, if B₂ is available, B₃ is used. If both B₂        and B₃ are unavailable, A₂ is used. If all the three candidates        are unavailable, the motion information of CP1 cannot be        obtained.    -   For CP2, the checking priority is B1→B0;    -   For CP3, the checking priority is A1→A0;    -   For CP4, T is used.

Secondly, the combinations of controls points are used to construct themotion model.

Motion information of three control points are needed to construct a6-parameter affine candidate. The three control points can be selectedfrom one of the following four combinations ({CP1, CP2, CP4}, {CP1, CP2,CP3}, {CP2, CP3, CP4}, {CP1, CP3, CP4}). Combinations {CP1, CP2, CP3},{CP2, CP3, CP4}, {CP1, CP3, CP4} are converted to a 6-parameter motionmodel represented by top-left, top-right and bottom-left control points.

Motion information of two control points are needed to construct a4-parameter affine candidate. The two control points can be selectedfrom one of the following six combinations ({CP1, CP4}, {CP2, CP3},{CP1, CP2}, {CP2, CP4}, {CP1, CP3}, {CP3, CP4}). Combinations {CP1,CP4}, {CP2, CP3}, {CP2, CP4}, {CP1, CP3}, {CP3, CP4} are converted to a4-parameter motion model represented by top-left and top-right controlpoints.

The combinations of constructed affine candidates are inserted into tocandidate list as following order:

-   -   {CP1, CP2, CP3}, {CP1, CP2, CP4}, {CP1, CP3, CP4}, {CP2, CP3,        CP4}, {CP1, CP2}, {CP1, CP3}, {CP2, CP3}, {CP1, CP4}, {CP2,        CP4}, {CP3, CP4}

For reference picture list X (X being 0 or 1) of a combination, thereference picture index with highest usage ratio in the control pointsis selected as the reference picture index of list X, and motion vectorspoint to difference reference picture are scaled.

After a candidate is derived, full pruning process is performed to checkwhether same candidate has been inserted into the list. If a samecandidate exists, the derived candidate is discarded.

(3) Padding with Zero Motion Vectors

If the number of candidates in affine merge candidate list is less than5, zero motion vectors with zero reference indices are insert into thecandidate list, until the list is full.

Therefore, the complexity of this separate affine merge list isgenerated as follows:

Max Max inherited constructed Max Merge affine affine candidate MVAdditional list size candidate candidate comparison scaling buffer 5 1 60 2 2x

6. Examples of a Sub-Block Merge Candidate List

In some embodiments, all the sub-block related motion candidates are putin a separate merge list in addition to the regular merge list fornon-sub block merge candidates. For example, the sub-block relatedmotion candidates are put in a separate merge list is named as‘sub-block merge list’. In one example, the sub-block merge listincludes affine merge candidates, and ATMVP candidate, and/or sub-blockbased STMVP candidate.

6.1 Example Embodiments

In some embodiments, the ATMVP merge candidate in the normal merge listis moved to the first position of the affine merge list. Such that allthe merge candidates in the new list (e.g., sub-block based mergecandidate list) are based on sub-block coding tools.

7. Drawbacks of Existing Methods

The idea of using the first available spatial merge candidate isbeneficial for the case when ATMVP candidate is added to the regularmerge mode. When ATMVP candidate is added to the sub-block based mergelist, it still requires to go through the regular merge listconstruction process which interrupts the motivation of adding ATMVP tothe sub-block based merge list, that is, reducing the interactionbetween sub-block merge list and regular merge list. For the worst case,it still requires to check the availability of four spatial neighboringblocks and check whether it is intra coded or not.

In some embodiments, an ATMVP candidate is always inserted to the mergelist before affine motion candidates which may be not be efficient forsequences with affine motion.

In some embodiments, an ATMVP candidate may be unavailable afterchecking the temporal block in the co-located picture. Therefore, for agiven merge index, e.g., equal to 0, it may represent an ATMVP candidateor an affine merge candidate which is not compatible with a simplifiedhardware implementation.

8. Example Methods for Simplifying Sub-Block Motion Candidate Lists

Embodiments of the disclosed technology simplify generating sub-blockmotion candidate lists, which may improve video coding efficiency andenhance both existing and future video coding standards, are elucidatedin the following examples described for various implementations. In thefollowing examples, which should not be construed to be limiting, theterm ‘ATMVP’ is not restricted to be ‘sub-block based ATMVP’, it couldalso represent the ‘non-sub-block based ATMVP’ which could also beinterpreted as a TMVP candidate. In addition, the following methods mayalso be applied to other motion candidate list construction process,such as AMVP candidate list, regular merge candidate with non-sub-blockmerge candidates.

Furthermore, in the following examples, a motion category is defined asincluding all motion candidates derived with the same coding tool. Inother words, for each coding tool, such as affine, ATMVP, STMVP, thecorresponding motion candidates belonging to a single motion category.

Example 1

Instead of finding the first available merge candidate in the regularmerge list for the ATMVP candidate derivation, the motion information ofonly one spatial neighboring block may be accessed and utilized in theATMVP candidate derivation process. For example, if the motioninformation of the only spatial neighboring block is available, theATMVP candidate can be determined based on such motion information. Asanother example, if the motion information of the only spatialneighboring block is not available, the ATMVP candidate can bedetermined based on default motion information, such as a zero motionvector.

In some embodiments, the only spatial neighboring block is defined asthe first spatial neighboring block to be checked in the regular mergelist, such as A₁ depicted in FIG. 5.

In some embodiments, the only spatial neighbouring block is defined asthe first available spatial neighbouring block in a checking order, suchas A1, B1, B0, A0, B2. For example, when a neighboring block exists andhas been coded when coding the current block, it is treated asavailable. In some embodiments, when a neighboring block exists in thesame tile and has been coded when coding the current block, it istreated as available. In one example, the neighbouring blocks to bechecked in order are A1, B1.

In some embodiments, the only spatial neighboring block may be differentfrom those used in the regular merge mode derivation process.

In some embodiments, the motion information of the first K spatialneighbouring blocks may be accessed. In one example, K is equal to 2, 3.

The checking order of spatial neighbouring blocks may be the same ordifferent from that used in the regular merge list derivation process.

Example 2

In some embodiments, the ATMVP candidates may be derived from a temporalblock identified by motion information of a spatial neighboring block ofthe coding unit that is not used in the regular merge list derivationprocess.

In some embodiments, the spatial neighboring blocks used in the ATMVPprocess can be totally different from those used in the regular mergelist derivation process. For example, blocks B3, A2, A3 in FIG. 5 can beused.

In some embodiments, part of the spatial neighboring blocks used in theATMVP process may be the same as those used in the regular merge listderivation process while the remaining are different. For example,blocks A1, B1, B3, A2, A3 as shown in FIG. 5 can be used.

In some embodiments, the motion information of selected spatialneighboring block(s) may be further scaled before identifying thetemporal block.

Example 3

In some embodiments, instead of relying on motion information of aneighboring block, a History-based MV Prediction (HMVP) candidatefetched from a HMVP table or list can be used to derive the ATMVPcandidate. History-based Motion Vector Prediction (HMVP) methods, e.g.,as described in PCT/CN2018/105193 and PCT/CN2018/093987, use previouslycoded motion information for prediction. That is, an ATMVP candidate canbe derived based on a table of motion candidates (e.g., can includeATMVP candidates and non-ATMVP candidates) derived during the videoprocessing. The derived ATMVP candidate for the current coding unit canbe used to update the table of motion candidates. For example, thederived ATMVP candidate can be added to the table after pruning isperformed. Subsequent processing can be performed based on the updatedtable of motion candidates.

In some embodiments, scaling may be applied to the HMVP candidate.

Example 4

In some embodiments, usage of neighbouring block(s) or HMVP(s) to derivethe ATMVP candidate may be adaptive.

In some embodiments, which block(s) are used may be signaled from theencoder to the decoder in VPS/SPS/PPS/slice header/tile groupheader/tile/CTU/CU/PU/CTU row.

In some embodiments, which block(s) are used may depend on the widthand/or height of the current block. FIG. 9 shows examples of spatialneighboring blocks used for ATMVP temporal block identification.

Example 5

When temporal block identified in the ATMVP process (such as pointed bythe (scaled) motion vector from the first available merge candidate incurrent design or by zero motion vector) couldn't return a valid ATMVPcandidate (e.g., the temporal block is intra-coded), more temporalblocks may be searched till one or multiple ATMVP candidate is found.

The bottom-right of the identified temporal block may be furtherchecked. An example is depicted in FIG. 10. FIG. 10 shows examples ofalternative starting point identified by bottom-right block of thestarting point founded by prior art.

In some embodiments, a searching order may be defined, e.g., from theneighboring left, above, right, bottom of the identified temporal block;then non-adjacent left, above, right, bottom of the identified temporalblock with a step, and so on.

In one example, all the temporal blocks to be checked shall be within acertain region, such as within the same CTU as the identified temporalblock; or within the same CTU row of the identified temporal block.

In some embodiments, if there is no available ATMVP candidate afterchecking the identified temporal block and/or more temporal blocks,default ATMVP candidate may be utilized.

In one example, default ATMVP candidate may be defined as a motioncandidate inherited from a spatial neighboring block. In someembodiments, the motion candidate inherited from a spatial neighboringblock may be further scaled.

In some embodiments, default ATMVP candidate may be derived from thestarting point MV.

i. Example 1 may be utilized to find the starting point MV.

ii. In one example, the starting point MV may be a motion vectorassociated with a spatial adjacent or non-adjacent or temporal blockthat its corresponding reference picture is the collocated referencepicture.

iii. In one example, the starting point MV may be a motion vectorassociated with the first spatial block that its corresponding referencepicture is the collocated reference picture.

iv. In one example, the starting point MV may be a zero motion vector.

v. In one example, the starting point MV may be defined in the same wayas the current VVC design, that is, if the first spatial neighboringblock (e.g., with checking order of A1, B1, B0, A0, B2) that isinter-coded, and its motion vectors of List X (X being 0 or 1) ispointing to the collocated picture, the starting point MV is set to theassociated MV of the first spatial neighboring block for List X.otherwise, the starting point MV is set to zero motion vector.

vi. In one example, when the associated motion of the represented blockidentified by the starting point MV and the center position of currentblock is unavailable (e.g., the represented block is intra-coded or therepresented block is unavailable (e.g., out of the restricted region)),the starting point MV is treated as the motion information of therepresented block. In some embodiments, default motion information isderived from the starting point MV (i.e., from the motion information ofthe represented block).

vii. In some embodiments, furthermore, for any sub-block, if theassociated motion of its represented block identified by the startingpoint MV and the center position of current sub-block is unavailable,the starting point MV is treated as the motion information of therepresented block and utilized to derive the sub-block motion.

In one example, default ATMVP candidate may be set to zero motionvectors. In some embodiments, furthermore, the refence pictureassociated with the ATMVP candidate may be set to the collocatedpicture.

Example 6

When a motion vector is utilized to derive default motion informationfor the ATMVP candidate (i.e., default ATMVP candidate), auni-prediction candidate may be derived by scaling a starting motionvector to a reference picture index within the reference picture list X.That is, the default ATMVP candidate is a uni-prediction candidate.

In one example, the reference picture index is set to 0.

In one example, the reference picture index is set to the smallestreference picture index that is corresponding to a short-term referencepicture.

In one example, the reference picture index is set to the one that isused by TMVP candidate for reference picture list X.

In one example, the reference picture list X is set to List 0 or list 1.

In one example, the reference picture list X is dependent onslice/picture type and/or the reference picture list that collocatedpicture is from.

In one example, X is set to List (B Slice/picture ? 1−getColFromL0Flag(): 0). The function getColFromL0Flag( ) returns 1 when collocatedpicture is from List 0; and returns 0 when collocated picture is fromList 1.

Example 7

When a motion vector is utilized to derive default motion informationfor the ATMVP candidate (i.e., default ATMVP candidate), a bi-predictioncandidate may be derived by scaling the motion vector to certainreference picture indices within two reference picture lists. That is,default ATMVP candidate is a bi-prediction candidate.

For each reference picture, a certain reference picture index isselected. In one example, it may be defined to be the same as that usedfor the target reference picture index (e.g., 0 in current VVC design)of TMVP candidate.

Example 8

Whether to set default motion information to uni or B₁-predictioncandidate may depend on the picture/slice type. In some embodiments, itmay depend on block dimension. In one example, if there are less than 64samples, uni-prediction default motion information is utilized in theATMVP process.

Example 9

The final merge candidate list includes at least one candidate for eachmotion category. A motion category can be a temporal motion vectorprediction candidate category, an affine motion candidate category, orother types of categories. In some embodiments, at least one ATMVPcandidate is always included in the merge candidate list. In someembodiments, at least one affine candidate is always included in themerge candidate list.

Example 10

A merge index may be assigned to a given motion category. When the mergeindex is known, the decoder can be ready to load information from abranch corresponding to this motion category.

For example, merge index within the range [m, n], inclusive, maycorrespond to ATMVP candidates. Merge index within the range [k, 1],inclusive, may correspond to affine candidates. In one example, m=n=0,k=1, 1>=k

In some embodiments, the assigned index(s) may be adaptive. In oneexample, the assigned index(s) may be signaled from the encoder to thedecoder in VPS/SPS/PPS/slice header/tile group header/tile/CTU/CU/PU/CTUrow. In one example, the assigned index(s) may depend on the widthand/or height of the current block

Example 11

When multiple ATMVP candidates are added to the merge candidate list(e.g., the sub-block merge candidate list), affine motion candidates canbe added before all ATMVP candidates. In some embodiments, ATMVPcandidates and affine motion candidates may be inserted in aninterleaved way, i.e., one or more affine motion candidates are beforean ATMVP candidate, some after.

Example 12

The order of affine motion candidates and non-affine motion candidates(e.g., ATMVP and/or STMVP candidates) may be adaptively changed fromblock to block, or from tile to tile, or from picture to picture, orfrom sequence to sequence.

The adaptive order may depend on the neighboring blocks' codedinformation and/or coded information of current block. In one example,if all or majority of selected neighboring blocks are coded with affinemode, affine motion candidates may be added before other non-affinemotion candidates.

The adaptive order may depend on the number of available affine motioncandidates and/or number of non-affine candidates. In one example, ifthe ratio between number of available affine motion candidates andnon-affine candidates is larger than a threshold, affine motioncandidates may be inserted before non-affine motion candidates.

The adaptive order may be only applicable to the first K affine motioncandidates (e.g., K is set to 1). In this case, only the first K affinemotion candidates may be adaptively decided whether to be insertedbefore or after non-affine motion candidates.

When there are even more than 2 categories (i.e., only affine and ATMVPcandidates in current design), the adaptive order of inserting differentmotion candidate can still be applied.

Example 13

An indication of sub-block related technologies can be signaled inpicture header/PPS/slice header/tile group header. When the indicationtells a sub-block related technology is disabled, there is no need tosignal any related information for such a technology in block level.

In one example, an indication (such as a flag) of ATMVP at pictureheader/slice header/tile header may be signaled.

In one example, an indication (such as a flag) of affine at pictureheader/slice header/tile header may be signaled.

Example 14

The order of motion candidates for different motion categories (e.g.,ATMVP, affine, STMVP) may be pre-defined or signaled inSPS/VPS/PPS/picture header/tile group header/slice etc. al.

In one example, a flag may be signaled to indicate whether affine motioncandidates should be after all non-affine motion candidates.

In one example, a flag may be signaled to indicate whether ATMVP motioncandidates should be before all affine motion candidates.

Example 15

It is desirable to unify the ATMVP sub-block motion derivation processand TMVP process. In one example, the sub-block motion derivationprocess reuses the TMVP process. In one example, the TMVP process reusethe sub-block motion derivation process.

Example 16

For the sub-block merge candidate list, the ATMVP candidate can alwaysbe available and the temporal information is disallowed to derive affinecandidates. In one example, merge index to the sub-block merge candidatelist equal to 0 is always corresponding to an ATMVP candidate. In oneexample, merge index to the sub-block merge candidate list unequal to 0is always corresponding to an affine candidate.

9. Additional Embodiment Examples

This section givens an embodiment that how to make ATMVP candidatealways being available. The changes compared to the latest VVCspecification are bold (for newly added) or italicized (for deleted).

9.1 Example #1 (a Uni-Prediction Default ATMVP Candidate to Fill inSub-Blocks if Needed)

8.3.4.4 Derivation process for subblock-based temporal merging basemotion data (note: default motion information)Inputs to this process are:

-   -   the location (xCtb, yCtb) of the top-left sample of the luma        coding tree block that contains the current coding block,    -   the location (xColCtrCb, yColCtrCb) of the top-left sample of        the collocated luma coding block that covers the below-right        center sample.    -   the availability flags availableFlagA₀, availableFlagA₁,        availableFlagB₀, and availableFlagB₁ of the neighbouring coding        units,    -   the reference indices refIdxLXA₀, refIdxLXA₁, refIdxLXB₀, and        refIdxLXB₁ of the neighbouring coding units,    -   the prediction list utilization flags predFlagLXA₀,        predFlagLXA₁, predFlagLXB₀, and predFlagLXB₁ of the neighbouring        coding units,    -   the motion vectors in 1/16 fractional-sample accuracy mvLXA₀,        mvLXA₁, mvLXB₀, and mvLXB₁ of the neighbouring coding units.        Outputs of this process are:    -   the motion vectors ctrMvL0 and ctrMvL1,    -   the prediction list utilization flags ctrPredFlagL0 and        ctrPredFlagL1,    -   the reference indices ctrRefIdxL0 and ctrRefIdxL1,    -   the temporal motion vector tempMV.        The variable tempMv is set as follows:

tempMv[0]=0  (8-501)

tempMv[1]=0  (8-502)

The variable currPic specifies the current picture.The variable availableFlagN is set equal to FALSE, and the followingapplies:

-   -   When availableFlagA₀ is equal to 1, the following applies:        -   availableFlagN is set equal to TRUE,        -   refIdxLXN is set equal to refIdxLXA₀ and mvLXN is set equal            to mvLXA₀, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagLB₀ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE,        -   refIdxLXN is set equal to refIdxLXB₀ and mvLXN is set equal            to mvLXB₀, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagB₁ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE.        -   refIdxLXN is set equal to refIdxLXB₁ and mvLXN is set equal            to mvLXB₁, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagA₁ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE.        -   refIdxLXN is set equal to refIdxLXA₁ and mvLXN is set equal            to mvLXA₁, for X being replaced by 0 and 1.            tempMV is set to zero motion vector.            When availableFlagN is equal to TRUE, the following applies:    -   If all of the following conditions are true, tempMV is set equal        to mvL1N:        -   predFlagL1N is equal to 1,        -   DiffPicOrderCnt(ColPic, RefPicList1[refIdxL1N]) is equal to            0,        -   DiffPicOrderCnt(aPic, currPic) is less than or equal to 0            for every picture aPic in every reference picture list of            the current slice,        -   slice_type is equal to B,        -   collocated_from_l0_flag is equal to 0.    -   Otherwise if all of the following conditions are true, tempMV is        set equal to mvL0N:        -   predFlagL0N is equal to 1,        -   DiffPicOrderCnt(ColPic, RefPicList0[refIdxL0N]) is equal to            0.            The location (xColCb, yColCb) of the collocated block inside            ColPic is derived as follows.

xColCb=Clip3(xCtb,Min(CurPicWidthInSamplesY−1,xCtb+(1<<Ctb Log2SizeY)+3),  (8-503)

xColCtrCb+(tempMv[0]>>4))

yColCb=Clip3(yCtb,Min(CurPicHeightInSamplesY−1,yCtb+(1<<Ctb Log2SizeY)−1),  (8-504)

yColCtrCb+(tempMv[1]>>4))

The array colPredMode is set equal to the prediction mode arrayCuPredMode of the collocated picture specified by ColPic.

The motion vectors ctrMvL0 and ctrMvL1, the prediction list utilizationflags ctrPredFlagL0 and ctrPredFlagL1, and the reference indicesctrRefIdxL0 and ctrRefIdxL1 are derived as follows:

-   -   Set ctrPredFlagL0=0, ctrPredFlagL1=0.    -   If colPredMode[xColCb][yColCb] is equal to MODE_INTER, the        following applies:        -   The variable currCb specifies the luma coding block covering            (xCtrCb,yCtrCb) inside the current picture.        -   The variable colCb specifies the luma coding block covering            the modified location given by ((xColCb>>3)<<3,            (yColCb>>3)<<3) inside the ColPic.        -   The luma location (xColCb, yColCb) is set equal to the            top-left sample of the collocated luma coding block            specified by colCb relative to the top-left luma sample of            the collocated picture specified by ColPic.        -   The derivation process for temporal motion vector prediction            in subclause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), centerRefIdxL0, and sbFlag set equal to 1            as inputs and the output being assigned to ctrMvL0 and            ctrPredFlagL0.        -   The derivation process for temporal motion vector prediction            in subclause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), centerRefIdxL1, and sbFlag set equal to 1            as inputs and the output being assigned to ctrMvL1 and            ctrPredFlagL1.    -   If both ctrPredFlagL0 and ctrPredFlagL1 are equal to 0, the        following applies:        -   Set target reference picture list index X=slice.isInterB( )?            1−slice.getColFromL0Flag( ): 0        -   Scale tempMV to reference picture list X and reference            picture index equal to 0 and set ctrMvLX to the scaled            tempMV.        -   ctrPredFlagLX=1.    -   Otherwise, the following applies:

ctrPredFlagL0=0  (8-505)

ctrPredFlagL1=0  (8-506)

Example #2 (a Bi-Prediction Default ATMVP Candidate to Fill inSub-Blocks if Needed) 8.3.4.4 Derivation Process for Subblock-BasedTemporal Merging Base Motion Data (Note: Default Motion Information)

Inputs to this process are:

-   -   the location (xCtb, yCtb) of the top-left sample of the luma        coding tree block that contains the current coding block,    -   the location (xColCtrCb, yColCtrCb) of the top-left sample of        the collocated luma coding block that covers the below-right        center sample.    -   the availability flags availableFlagA₀, availableFlagA₁,        availableFlagB₀, and availableFlagB₁ of the neighbouring coding        units,    -   the reference indices refIdxLXA₀, refIdxLXA₁, refIdxLXB₀, and        refIdxLXB₁ of the neighbouring coding units,    -   the prediction list utilization flags predFlagLXA₀,        predFlagLXA₁, predFlagLXB₀, and predFlagLXB₁ of the neighbouring        coding units,    -   the motion vectors in 1/16 fractional-sample accuracy mvLXA₀,        mvLXA₁, mvLXB₀, and mvLXB₁ of the neighbouring coding units.        Outputs of this process are:    -   the motion vectors ctrMvL0 and ctrMvL1,    -   the prediction list utilization flags ctrPredFlagL0 and        ctrPredFlagL1,    -   the reference indices ctrRefIdxL0 and ctrRefIdxL1,    -   the temporal motion vector tempMV.        The variable tempMv is set as follows:

tempMv[0]=0  (8-501)

tempMv[1]=0  (8-502)

The variable currPic specifies the current picture.The variable availableFlagN is set equal to FALSE, and the followingapplies:

-   -   When availableFlagA₀ is equal to 1, the following applies:        -   availableFlagN is set equal to TRUE,        -   refIdxLXN is set equal to refIdxLXA₀ and mvLXN is set equal            to mvLXA₀, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagLB₀ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE,        -   refIdxLXN is set equal to refIdxLXB₀ and mvLXN is set equal            to mvLXB₀, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagB₁ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE.        -   refIdxLXN is set equal to refIdxLXB₁ and mvLXN is set equal            to mvLXB₁, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagA₁ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE.        -   refIdxLXN is set equal to refIdxLXA₁ and mvLXN is set equal            to mvLXA₁, for X being replaced by 0 and 1.            tempMV is set to zero motion vector.            When availableFlagN is equal to TRUE, the following applies:    -   If all of the following conditions are true, tempMV is set equal        to mvL1N:        -   predFlagL1N is equal to 1,        -   DiffPicOrderCnt(ColPic, RefPicList1[refIdxL1N]) is equal to            0,        -   DiffPicOrderCnt(aPic, currPic) is less than or equal to 0            for every picture aPic in every reference picture list of            the current slice,        -   slice_type is equal to B,        -   collocated_from_l0_flag is equal to 0.    -   Otherwise if all of the following conditions are true, tempMV is        set equal to mvL0N:        -   predFlagL0N is equal to 1,        -   DiffPicOrderCnt(ColPic, RefPicList0[refIdxL0N]) is equal to            0.            The location (xColCb, yColCb) of the collocated block inside            ColPic is derived as follows.

xColCb=Clip3(xCtb,Min(CurPicWidthInSamplesY−1,xCtb+(1<<Ctb Log2SizeY)+3),  (8-503)

xColCtrCb+(tempMv[0]>>4))

yColCb=Clip3(yCtb,Min(CurPicHeightInSamplesY−1,yCtb+(1<<Ctb Log2SizeY)−1),  (8-504)

yColCtrCb+(tempMv[1]>>4))

The array colPredMode is set equal to the prediction mode arrayCuPredMode of the collocated picture specified by ColPic.The motion vectors ctrMvL0 and ctrMvL1, the prediction list utilizationflags ctrPredFlagL0 and ctrPredFlagL1, and the reference indicesctrRefIdxL0 and ctrRefIdxL1 are derived as follows:

-   -   Set ctrPredFlagL0=0, ctrPredFlagL1=0.    -   If colPredMode[xColCb][yColCb] is equal to MODE_INTER, the        following applies:        -   The variable currCb specifies the luma coding block covering            (xCtrCb,yCtrCb) inside the current picture.        -   The variable colCb specifies the luma coding block covering            the modified location given by ((xColCb>>3)<<3,            (yColCb>>3)<<3) inside the ColPic.        -   The luma location (xColCb, yColCb) is set equal to the            top-left sample of the collocated luma coding block            specified by colCb relative to the top-left luma sample of            the collocated picture specified by ColPic.        -   The derivation process for temporal motion vector prediction            in subclause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), centerRefIdxL0, and sbFlag set equal to 1            as inputs and the output being assigned to ctrMvL0 and            ctrPredFlagL0.        -   The derivation process for temporal motion vector prediction            in subclause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), centerRefIdxL1, and sbFlag set equal to 1            as inputs and the output being assigned to ctrMvL1 and            ctrPredFlagL1.    -   If both ctrPredFlagL0 and ctrPredFlagL1 are equal to 0, the        following applies:        -   Set target reference picture list index X=0        -   Scale tempMV to reference picture list X and reference            picture index equal to 0 and set ctrMvLX to the scaled            tempMV.        -   ctrPredFlagLX=1.        -   If current slice/picture is B slice,        -   1. set target reference picture list index X=1        -   2. Scale tempMV to reference picture list X and reference            picture index equal to 0 and set ctrMvLX to the scaled            tempMV.        -   3. ctrPredFlagLX=1.    -   Otherwise, the following applies:

ctrPredFlagL0=0  (8-505)

ctrPredFlagL1=0  (8-506)

Example 3: ATMVP Candidate Starting Point MV from One Spatial Block8.3.4.4 Derivation Process for Subblock-Based Temporal Merging BaseMotion Data (Note: Default Motion Information)

Inputs to this process are:

-   -   the location (xCtb, yCtb) of the top-left sample of the luma        coding tree block that contains the current coding block,    -   the location (xColCtrCb, yColCtrCb) of the top-left sample of        the collocated luma coding block that covers the below-right        center sample.    -   the availability flags availableFlagA₀, availableFlagA₁,        availableFlagB₀, and availableFlagB₁ of the neighbouring coding        units,    -   the reference indices refIdxLXA₀, refIdxLXA₁, refIdxLXB₀, and        refIdxLXB₁ of the neighbouring coding units,    -   the prediction list utilization flags predFlagLXA₀,        predFlagLXA₁, predFlagLXB₀, and predFlagLXB₁ of the neighbouring        coding units,    -   the motion vectors in 1/16 fractional-sample accuracy mvLXA₀,        mvLXA₁, mvLXB₀, and mvLXB₁ of the neighbouring coding units.        Outputs of this process are:    -   the motion vectors ctrMvL0 and ctrMvL1,    -   the prediction list utilization flags ctrPredFlagL0 and        ctrPredFlagL1,    -   the reference indices ctrRefIdxL0 and ctrRefIdxL1,    -   the temporal motion vector tempMV.        The variable tempMv is set as follows:

tempMv[0]=0  (8-501)

tempMv[1]=0  (8-502)

The variable currPic specifies the current picture.The variable availableFlagN is set equal to FALSE, and the followingapplies:

-   -   When availableFlagA₀ is equal to 1, the following applies:        -   availableFlagN is set equal to TRUE,        -   refIdxLXN is set equal to refIdxLXA₀ and mvLXN is set equal            to mvLXA₀, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagLB₀ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE,        -   refIdxLXN is set equal to refIdxLXB₀ and mvLXN is set equal            to mvLXB₀, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagB₁ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE.        -   refIdxLXN is set equal to refIdxLXB₁ and mvLXN is set equal            to mvLXB₁, for X being replaced by 0 and 1.    -   When availableFlagN is equal to FALSE and availableFlagA₁ is        equal to 1, the following applies:        -   availableFlagN is set equal to TRUE.        -   refIdxLXN is set equal to refIdxLXA₁ and mvLXN is set equal            to mvLXA₁, for X being replaced by 0 and 1.            When availableFlagN is equal to TRUE, the following applies:    -   . . .

Example #4 (Alignment of Sub-Block and TMVP Process) 8.3.4.4 DerivationProcess for Subblock-Based Temporal Merging Base Motion Data

Inputs to this process are:

-   -   the location (xCtb, yCtb) of the top-left sample of the luma        coding tree block that contains the current coding block,    -   the location (xColCtrCb, yColCtrCb) of the top-left sample of        the collocated luma coding block that covers the below-right        center sample.    -   the availability flags availableFlagA₀, availableFlagA₁,        availableFlagB₀, and availableFlagB₁ of the neighbouring coding        units,    -   the reference indices refIdxLXA₀, refIdxLXA₁, refIdxLXB₀, and        refIdxLXB₁ of the neighbouring coding units,    -   the prediction list utilization flags predFlagLXA₀,        predFlagLXA₁, predFlagLXB₀, and predFlagLXB₁ of the neighbouring        coding units,    -   the motion vectors in 1/16 fractional-sample accuracy mvLXA₀,        mvLXA₁, mvLXB₀, and mvLXB₁ of the neighbouring coding units.        Outputs of this process are:    -   the motion vectors ctrMvL0 and ctrMvL1,    -   the prediction list utilization flags ctrPredFlagL0 and        ctrPredFlagL1,    -   the reference indices ctrRefIdxL0 and ctrRefIdxL1,    -   the temporal motion vector tempMV.        The variable tempMv is set as follows:

tempMv[0]=0  (8-501)

tempMv[1]=0  (8-502)

The variable currPic specifies the current picture.The variable availableFlagN is set equal to FALSE, and the followingapplies:

-   -   . . .        When availableFlagN is equal to TRUE, the following applies:    -   If all of the following conditions are true, tempMV is set equal        to mvL1N:        -   predFlagL1N is equal to 1,        -   DiffPicOrderCnt(ColPic, RefPicList1[refIdxL1N]) is equal to            0,        -   DiffPicOrderCnt(aPic, currPic) is less than or equal to 0            for every picture aPic in every reference picture list of            the current slice,        -   slice_type is equal to B,        -   collocated_from_l0_flag is equal to 0.    -   Otherwise if all of the following conditions are true, tempMV is        set equal to mvL0N:        -   predFlagL0N is equal to 1,        -   DiffPicOrderCnt(ColPic, RefPicList0[refIdxL0N]) is equal to            0.            The location (xColCb, yColCb) of the collocated block inside            ColPic is derived as follows.

xColCb=Clip3(xCtb,Min(CurPicWidthInSamplesY−1,xCtb+(1<<Ctb Log2SizeY)+3),  (8-503)

xColCtrCb+(tempMv[0]>>4))

yColCb=Clip3(yCtb,Min(CurPicHeightInSamplesY−1,yCtb+(1<<Ctb Log2SizeY)−1),  (8-504)

yColCtrCb+(tempMv[1]>>4))

The array colPredMode is set equal to the prediction mode arrayCuPredMode of the collocated picture specified by ColPic.The motion vectors ctrMvL0 and ctrMvL1, the prediction list utilizationflags ctrPredFlagL0 and ctrPredFlagL1, and the reference indicesctrRefIdxL0 and ctrRefIdxL1 are derived as follows:

-   -   If colPredMode[xColCb][yColCb] is equal to MODE_INTER, the        following applies:        -   The variable currCb specifies the luma coding block covering            (xCtrCb,yCtrCb) inside the current picture.        -   The variable colCb specifies the luma coding block covering            the modified location given by ((xColCb>>3)<<3,            (yColCb>>3)<<3) inside the ColPic.        -   The luma location (xColCb, yColCb) is set equal to the            top-left sample of the collocated luma coding block            specified by colCb relative to the top-left luma sample of            the collocated picture specified by ColPic.        -   The derivation process for temporal motion vector prediction            in subclause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), centerRefIdxL0, and sbFlag set equal to 1            as inputs and the output being assigned to ctrMvL0 and            ctrPredFlagL0.        -   The derivation process for temporal motion vector prediction            in subclause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), centerRefIdxL1, and sbFlag set equal to 1            as inputs and the output being assigned to ctrMvL1 and            ctrPredFlagL1.    -   Otherwise, the following applies:

ctrPredFlagL0=0  (8-505)

ctrPredFlagL1=0  (8-506)

8.3.2.11 Derivation Process for Temporal Luma Motion Vector Prediction

Inputs to this process are:

-   -   . . .        Outputs of this process are:    -   the motion vector prediction mvLXCol in 1/16 fractional-sample        accuracy,    -   the availability flag availableFlagLXCol.        The variable currCb specifies the current luma coding block at        luma location (xCb, yCb).        The variables mvLXCol and availableFlagLXCol are derived as        follows:    -   If slice_temporal_mvp_enabled_flag is equal to 0, both        components of mvLXCol are set equal to 0 and availableFlagLXCol        is set equal to 0.    -   Otherwise (slice_temporal_mvp_enabled_flag is equal to 1), the        following ordered steps apply:    -   1. The bottom right collocated motion vector is derived as        follows:

xColBr=xCb+cbWidth  (8-330)

yColBr=yCb+cbHeight  (8-331)

-   -   -   If yCb>>CtbLog2SizeY is equal to yColBr>>CtbLog2SizeY,            yColBr is less than pic_height_in_luma_samples and xColBr is            less than pic_width_in_luma_samples, the following applies:            -   The variable colCb specifies the luma coding block                covering the modified location given by ((xColBr>>3)<<3,                (yColBr>>3)<<3) inside the collocated picture specified                by ColPic.            -   The luma location (xColCb, yColCb) is set equal to the                top-left sample of the collocated luma coding block                specified by colCb relative to the top-left luma sample                of the collocated picture specified by ColPic.            -   The derivation process for collocated motion vectors as                specified in clause 8.3.2.12 is invoked with currCb,                colCb, (xColCb, yColCb), refIdxLX and sbFlag set equal                to 0 as inputs, and the output is assigned to mvLXCol                and availableFlagLXCol.        -   Otherwise, both components of mvLXCol are set equal to 0 and            availableFlagLXCol is set equal to 0.

    -   2. When availableFlagLXCol is equal to 0, the central collocated        motion vector is derived as follows:

xColCtr=xCb+(cbWidth>>1)  (8-332)

yColCtr=yCb+(cbHeight>>1)  (8-333)

-   -   -   The variable colCb specifies the luma coding block covering            the modified location given by ((xColCtr>>3)<<3,            (yColCtr>>3)<<3) inside the collocated picture specified by            ColPic.        -   The luma location (xColCb, yColCb) is set equal to the            top-left sample of the collocated luma coding block            specified by colCb relative to the top-left luma sample of            the collocated picture specified by ColPic.        -   The derivation process for collocated motion vectors as            specified in clause 8.3.2.12 is invoked with currCb, colCb,            (xColCb, yColCb), refIdxLX and sbFlag set equal to 0 as            inputs, and the output is assigned to mvLXCol and            availableFlagLXCol.

8.3.2.12 Derivation Process for Collocated Motion Vectors

Inputs to this process are:

-   -   a variable currCb specifying the current coding block,    -   a variable colCb specifying the collocated coding block inside        the collocated picture specified by ColPic,    -   a luma location (xColCb, yColCb) specifying the top-left sample        of the collocated luma coding block specified by colCb relative        to the top-left luma sample of the collocated picture specified        by ColPic,    -   a reference index refIdxLX, with X being 0 or 1,    -   a flag indicating a subblock temporal merging candidate sbFlag.        Outputs of this process are:    -   the motion vector prediction mvLXCol in 1/16 fractional-sample        accuracy,    -   the availability flag availableFlagLXCol.        The variable currPic specifies the current picture.        The arrays predFlagL0Col[x][y], mvL0Col[x][y] and        refIdxL0Col[x][y] are set equal to PredFlagL0[x][y], MvL0[x][y]        and RefIdxL0[x][y], respectively, of the collocated picture        specified by ColPic, and the arrays predFlagL1Col[x][y],        mvL1Col[x][y] and refIdxL1Col[x][y] are set equal to        PredFlagL1[x][y], MvL1[x][y] and RefIdxL1[x][y], respectively,        of the collocated picture specified by ColPic.        The variables mvLXCol and availableFlagLXCol are derived as        follows:    -   If colCb is coded in an intra prediction mode, both components        of mvLXCol are set equal to 0 and availableFlagLXCol is set        equal to 0.    -   Otherwise, the motion vector mvCol, the reference index        refIdxCol and the reference list identifier listCol are derived        as follows:        -   If sbFlag is equal to 0, availableFlagLXCol is set to 1 and            the following applies:            -   If predFlagL0Col[xColCb][yColCb] is equal to 0, mvCol,                refIdxCol and listCol are set equal to                mvL1Col[xColCb][yColCb], refIdxL1Col[xColCb][yColCb] and                L1, respectively.            -   Otherwise, if predFlagL0Col[xColCb][yColCb] is equal to                1 and predFlagL1Col[xColCb][yColCb] is equal to 0,                mvCol, refIdxCol and listCol are set equal to                mvL0Col[xColCb][yColCb], refIdxL0Col[xColCb][yColCb] and                L0, respectively.            -   Otherwise (predFlagL0Col[xColCb][yColCb] is equal to 1                and predFlagL1Col[xColCb][yColCb] is equal to 1), the                following assignments are made:                -   If NoBackwardPredFlag is equal to 1, mvCol,                    refIdxCol and listCol are set equal to                    mvLXCol[xColCb][yColCb], refIdxLXCol[xColCb][yColCb]                    and LX, respectively.                -   Otherwise, mvCol, refIdxCol and listCol are set                    equal to mvLNCol[xColCb][yColCb],                    refIdxLNCol[xColCb][yColCb] and LN, respectively,                    with N being the value of collocated_from_l0_flag.        -   Otherwise (sbFlag is equal to 1), the following applies:            -   If PredFlagLXCol[xColCb][yColCb] is equal to 1, mvCol,                refIdxCol, and listCol are set equal to                mvLXCol[xColCb][yColCb], refIdxLXCol[xColCb][yColCb],                and LX, respectively, availableFlagLXCol is set to 1.            -   Otherwise (PredFlagLXCol[xColCb][yColCb] is equal to 0),                the following applies:                -   If DiffPicOrderCnt(aPic, currPic) is less than or                    equal to 0 for every picture aPic in every reference                    picture list of the current slice and                    PredFlagLYCol[xColCb][yColCb] is equal to 1, mvCol,                    refIdxCol, and listCol are set to                    mvLYCol[xColCb][yColCb], refIdxLYCol[xColCb][yColCb]                    and LY, respectively, with Y being equal to !X where                    X being the value of X this process is invoked for.                    availableFlagLXCol is set to 1.                -   Both the components of mvLXCol are set to 0 and                    availableFlagLXCol is set equal to 0.

When availableFlagLXCol is equal to TRUE, mvLXCol and availableFlagLXColare derived as follows:

. . . (remaining details similar to the current version of VVCspecification).

The examples described above may be incorporated in the context of themethods described below, e.g., methods 1100, 1200 and 1300, which may beimplemented at a video decoder and/or video encoder.

FIG. 11 shows a flowchart of an example method for video processing. Themethod 1100 includes, at step 1110, selecting, for sub-block levelprocessing of a current video block, motion information associated witha spatial neighboring block.

In some embodiments, and in the context of Example 1, the spatialneighboring block is a first spatial neighboring block that is checkedin the sub-block based merge list.

In some embodiments, and in the context of Example 4, selecting thespatial neighboring block is based on signaling in a video parameter set(VPS), a sequence parameter set (SPS), a picture parameter set (PPS), aslice header, a tile group header, a coding tree unit (CTU), a tile, acoding unit (CU), a prediction unit (PU) or a CTU row. In otherembodiments, selecting the spatial neighboring block is based on aheight or a width of the current video block.

The method 1100 includes, at step 1120, deriving, based on the motioninformation, a motion vector prediction candidate.

In some embodiments, and in the context of Example 2, deriving themotion vector prediction candidate includes the steps of identifying,based on the motion information, a temporal neighboring block, andderiving the motion vector prediction candidate based on the temporalneighboring block. In some embodiments, the motion information is scaledprior to the identifying the temporal neighboring block.

In some embodiments, and in the context of Example 2, the identifyingthe temporal neighboring block includes the steps of performing asequential multi-step search over each of a plurality of temporalneighboring blocks, and terminating the sequential multi-step searchupon identifying a first of the plurality of temporal neighboring blocksthat returns at least one valid motion vector prediction candidate. Inone example, the sequential multi-step search is over one or moretemporal blocks in a coding tree unit (CTU) that comprises theidentified temporal neighboring block. In another example, thesequential multi-step search is over one or more temporal blocks in asingle row of a coding tree unit (CTU) that comprises the identifiedtemporal neighboring block.

In some embodiments, and in the context of Example 3, the motioninformation is replaced by a history-based motion vector prediction(HMVP) candidate prior to deriving the motion vector predictioncandidate. In an example, the HMVP candidate is scaled prior to derivingthe motion vector prediction candidate.

The method 1100 includes, at step 1130, adding the motion vectorprediction candidate to a sub-block based merge list that is differentfrom a merge list and excludes block-level prediction candidates.

The method 1100 includes, at step 1140, reconstructing the current videoblock or decoding other video blocks based on the motion vectorprediction candidate.

FIG. 12 shows a flowchart of an example method for video processing. Themethod 1200 includes, at step 1210, deriving, for sub-block levelprocessing of a current video block, a motion vector predictioncandidate.

The method 1200 includes, at step 1220, assigning a merge index to atype of the motion vector prediction candidate.

The method 1200 includes, at step 1230, adding the motion vectorprediction candidate and the merge index to a sub-block based merge listthat is different from a merge list and excludes block-level predictioncandidates.

In some embodiments, and in the context of Example 7, the method 1200further includes the steps of determining the type of motion informationassociated with the current video block, and reconstructing the currentvideo block or decoding other video blocks based on one or more motionvector prediction candidates from the sub-block based merge list,wherein the one or more motion vector prediction candidates are selectedbased on the type. In one example, the merge index within a first rangecorresponds to one or more alternative temporal motion vector prediction(ATMVP) candidates. In another example, the merge index within a secondrange corresponds to one or more affine candidates. In yet anotherexample, the merge index is based on signaling in a video parameter set(VPS), a sequence parameter set (SPS), a picture parameter set (PPS), aslice header, a tile group header, a coding tree unit (CTU), a tile, acoding unit (CU), a prediction unit (PU) or a CTU row. In yet anotherexample, the type of the motion vector prediction candidate is an affinemotion vector prediction candidate, an alternative temporal motionvector prediction (ATMVP) candidate or a spatial-temporal motion vectorprediction (STMVP) candidate.

In some embodiments, and in the context of Example 8, adding the motionvector prediction candidate to the sub-block based merge list is basedon an adaptive ordering. In one example, one or more alternativetemporal motion vector prediction (ATMVP) candidates are added to thesub-block based merge list prior to any affine motion vector predictioncandidates. In another example, one or more affine motion vectorprediction candidates are added to the sub-block based merge list priorto any alternative temporal motion vector prediction (ATMVP) candidates.

FIG. 13 shows a flowchart of an example method for video processing. Themethod 1300 includes, at step 1310, deriving, for sub-block levelprocessing of a current video block, a motion vector predictioncandidate.

The method 1300 includes, at step 1320, adding, based on an adaptiveordering, the motion vector prediction candidate to a sub-block basedmerge list that is different from a merge list and excludes block-levelprediction candidates.

In some embodiments, and in the context of Example 9, the adaptiveordering is based on coded information of the current block. In otherembodiments, the adaptive ordering is based on coded information of oneor more neighboring blocks of the current block. In yet otherembodiments, the adaptive ordering is based on signaling in a videoparameter set (VPS), a sequence parameter set (SPS), a picture parameterset (PPS), a slice header, a tile group header, a coding tree unit(CTU), a tile, a coding unit (CU), a prediction unit (PU) or a CTU row.In yet other embodiments, the adaptive ordering is based on a firstnumber of available affine motion vector prediction candidates and/or asecond number of available non-affine motion vector predictioncandidates.

In some embodiments, e.g., as disclosed in Items 6-8 and 15-16 insection 8, an example video processing method includes determining adefault motion candidate for a sub-block based coding mode for aconversion between a current video block and a bitstream representationof the current video block using one of the following: (a) auni-prediction candidate that is derived by scaling a starting motioncandidate to a reference picture index within a reference picture listX; or (b) a bi-prediction candidate that is derived by scaling toreference picture indexes within two reference picture lists; or (c)candidate in either (a) or (b) depending on a picture type or a slicetype of the current video block; or (d) a candidate derived for atemporal motion vector predictor (TMVP) process of the current videoblock. For example, under option (a), the starting motion vector couldbe a motion vector that is associated with a block pointing to acollocated picture or the first spatially neighboring block that has amotion vector pointing to a collocated picture or a zero motion vectoror another choice of motion vector. Additional features andimplementation options are described in Section 8, items 6-8 and 15-16.

9. Example Implementations of the Disclosed Technology

FIG. 14 is a block diagram of a video processing apparatus 1400. Theapparatus 1400 may be used to implement one or more of the methodsdescribed herein. The apparatus 1400 may be embodied in a smartphone,tablet, computer, Internet of Things (IoT) receiver, and so on. Theapparatus 1400 may include one or more processors 1402, one or morememories 1404 and video processing hardware 1406. The processor(s) 1402may be configured to implement one or more methods (including, but notlimited to, methods 1100, 1200 and 1300) described in the presentdocument. The memory (memories) 1404 may be used for storing data andcode used for implementing the methods and techniques described herein.The video processing hardware 1406 may be used to implement, in hardwarecircuitry, some techniques described in the present document.

FIG. 16 is a block diagram showing an example video processing system1600 in which various techniques disclosed herein may be implemented.Various implementations may include some or all of the components of thesystem 1600. The system 1600 may include input 1602 for receiving videocontent. The video content may be received in a raw or uncompressedformat, e.g., 8 or 10 bit multi-component pixel values, or may be in acompressed or encoded format. The input 1602 may represent a networkinterface, a peripheral bus interface, or a storage interface. Examplesof network interface include wired interfaces such as Ethernet, passiveoptical network (PON), etc. and wireless interfaces such as Wi-Fi orcellular interfaces.

The system 1600 may include a coding component 1604 that may implementthe various coding or encoding methods described in the presentdocument. The coding component 1604 may reduce the average bitrate ofvideo from the input 1602 to the output of the coding component 1604 toproduce a coded representation of the video. The coding techniques aretherefore sometimes called video compression or video transcodingtechniques. The output of the coding component 1604 may be eitherstored, or transmitted via a communication connected, as represented bythe component 1606. The stored or communicated bitstream (or coded)representation of the video received at the input 1602 may be used bythe component 1608 for generating pixel values or displayable video thatis sent to a display interface 1610. The process of generatinguser-viewable video from the bitstream representation is sometimescalled video decompression. Furthermore, while certain video processingoperations are referred to as “coding” operations or tools, it will beappreciated that the coding tools or operations are used at an encoderand corresponding decoding tools or operations that reverse the resultsof the coding will be performed by a decoder.

Examples of a peripheral bus interface or a display interface mayinclude universal serial bus (USB) or high definition multimediainterface (HDMI) or Displayport, and so on. Examples of storageinterfaces include SATA (serial advanced technology attachment), PCI,IDE interface, and the like. The techniques described in the presentdocument may be embodied in various electronic devices such as mobilephones, laptops, smartphones or other devices that are capable ofperforming digital data processing and/or video display.

FIG. 17 is a flowchart representation of a method 1700 for videoprocessing in accordance with the present disclosure. The method 1700includes, at operation 1710, determining, during a conversion between acurrent block of a video and a bitstream representation of the video, atemporal motion vector prediction candidate for at least a sub-block ofthe current block. The temporal motion vector prediction candidate isdetermined based on K neighboring blocks of the current block, K being apositive integer. The method 1700 includes, at operation 1720,performing the conversion based on the temporal motion vector predictioncandidate for the sub-block.

In some embodiments, the temporal motion vector prediction candidate iscompletely determined based on K neighboring blocks of the currentblock. In some embodiments, K=1. In some embodiments, K=2 or 3. In someembodiments, the temporal motion vector prediction candidate isdetermined without checking all motion candidates in a merge list of thecurrent block. In some embodiments, one of the K spatial neighboringblocks is same as a first spatial neighboring block checked in a mergelist construction process of a video block. In some embodiments, aspatial neighboring block of the video block is adjacent to abottom-left corner of the current block. In some embodiments, at leastone of the K spatial neighboring blocks is different from spatialneighboring blocks checked in in a merge list construction process of avideo block. In some embodiments, the K spatial neighboring blocks aredetermined by checking a plurality of available spatial neighboringblocks in a first order.

In some embodiments, the method further includes determining that aspatial neighboring block is available in case the spatial neighboringblock is coded prior to performing the conversion of the current block.In some embodiments, the spatial neighboring block is within a same tileas the current block. In some embodiments, the plurality of availablespatial neighboring blocks includes a first block adjacent to abottom-left corner of the current block and a second block adjacent to atop-right corner of the current block. In some embodiments, the methodincludes checking the K spatial neighboring blocks of the current blockin a first order, wherein spatial neighboring blocks in a block-basedmerge list construction process of a video block are checked in a secondorder, the second order being different than the first order. In someembodiments, K is equal to 1, and the first order indicates that a firstspatial neighboring block adjacent to a bottom-left corner of thecurrent block is to be checked while the second order indicates that asecond spatial neighboring block adjacent to an above-right corner of avideo block is to be checked.

In some embodiments, the temporal motion vector prediction includes anAlternative Temporal Motion Vector Prediction (ATMVP) candidate. In someembodiments, the method includes identifying a temporal block accordingto motion information of the K spatial neighboring blocks and derivingmotion information of the sub-block based on the motion information ofthe identified temporal block. In some embodiments, the method furtherincludes identifying a second video block in a different pictureaccording to motion information of the K neighboring blocks and derivingtemporal motion information of a sub-block based on the second videoblock. In some embodiments, a sub-block size is 8×8. In someembodiments, a sub-block size is same as a block size.

In some embodiments, the conversion comprises encoding the current blockto generate the bitstream representation. In some embodiments, theconversion comprises decoding the bitstream representation to generatethe current block.

FIG. 18 is a flowchart representation of a method 1800 for videoprocessing in accordance with the present disclosure. The method 1800includes, at operation 1810, determining, during a conversion between acurrent block of a video and a bitstream representation of the video, atemporal motion vector prediction candidate based on a temporalneighboring block of the current block. The temporal neighboring blockis identified based on motion information of a spatial neighboring blockselected from one or more spatial neighboring blocks that are differentfrom at least one spatial neighboring block used in a merge listconstruction process of a video block. The method 1800 also includes, atoperation 1820, performing the conversion based on the temporal motionvector prediction candidate.

In some embodiments, the temporal motion vector prediction candidateincludes an Alternative Temporal Motion Vector Prediction (ATMVP)candidate. In some embodiments, the one or more spatial neighboringblocks are different from all candidates in the merge list of thecurrent block. In some embodiments, the one or more spatial neighboringblocks include a block adjacent to a top-left corner of the currentblock. In some embodiments, a subset of the one or more spatialneighboring blocks is same as one or more candidates that are derivedfrom a merge list construction process of a video block. In someembodiments, the one or more spatial neighboring blocks include a firstblock adjacent to a bottom-left corner of the current block or a secondblock adjacent to a top-right corner of the current block.

In some embodiments, the motion information is scaled before thetemporal neighboring block is identified. In some embodiments, thespatial neighboring block is selected based on information in a videoparameter set (VPS), a sequence parameter set (SPS), a picture parameterset (PPS), a slice header, a tile group header, a coding tree unit(CTU), a tile, a coding unit (CU), a prediction unit (PU) or a CTU row.In some embodiments, the spatial neighboring block is selected based ona height or a width of the current block.

FIG. 19 is a flowchart representation of a method 1900 for videoprocessing in accordance with the present disclosure. The method 1900includes, at operation 1910, maintaining, for a conversion between acurrent block of a video and a bitstream representation of the video, atable of motion candidates based on past conversions of the video andthe bitstream representation. The method 1900 includes, at operation1920, deriving a temporal motion vector prediction candidate based onthe table of motion candidates. The method 1900 also includes, atoperation 1930, performing the conversion based on the temporal motionvector prediction candidate.

In some embodiments, the temporal motion vector prediction candidateincludes an Alternative Temporal Motion Vector Prediction (ATMVP)candidate. In some embodiments, the temporal motion vector predictioncandidate is scaled prior to the conversion. In some embodiments, themethod includes updating the table of motion candidates based on thetemporal motion vector prediction candidate. In some embodiments, themethod includes performing a subsequent conversion of the video and thebitstream representation using the updated table of motion candidates.In some embodiments, deriving the temporal motion vector predictioncandidate further comprises deriving the temporal motion vectorprediction candidate based on spatial neighboring blocks of a secondvideo block.

FIG. 20 is a flowchart representation of a method 2000 for videoprocessing in accordance with the present disclosure. The method 2000includes, at operation 2010, determining, for a conversion between acurrent block of a video and a bitstream representation of the video,one or more temporal motion vector prediction candidates for the currentblock. The method 2000 also includes, at operation 2020, performing theconversion based on the one or more temporal motion vector predictioncandidates. The one or more temporal motion vector prediction candidatescan be determined by identifying a first temporal adjacent block of thecurrent block based on an initial motion vector and examining additionaltemporal adjacent blocks to obtain the one or more temporal motionvector prediction candidates. The first temporal adjacent block includesinvalid motion information.

In some embodiments, the one or more temporal motion vector predictioncandidates include an Alternative Temporal Motion Vector Prediction(ATMVP) candidate. In some embodiments, the first temporal adjacentblock is intra-coded. In some embodiments, the additional temporaladjacent blocks comprise a second temporal adjacent block that includesa starting point positioned adjacent to a bottom-right corner of astarting point of the first adjacent temporal block.

In some embodiments, the additional temporal adjacent blocks areidentified based on a sequential multi-step search of blocks associatedwith the first temporal adjacent block. In some embodiments, thesequential multi-step search comprises examining spatial adjacent blocksof the first temporal adjacent block in an order of left, above, right,and bottom. In some embodiments, the sequential multi-step searchfurther comprises examining spatial non-adjacent blocks that are onestep away from the first temporal adjacent block in an order of left,above, right, and bottom. In some embodiments, the additional temporaladjacent blocks are positioned within a region associated with the firsttemporal adjacent block. In some embodiments, the region includes aCoding Tree Unit (CTU) associated with the first temporal adjacentblock. In some embodiments, the region includes a single row of the CTUassociated with the first temporal adjacent block.

FIG. 21 is a flowchart representation of a method 2100 for videoprocessing in accordance with the present disclosure. The method 2100includes, at operation 2110, determining, for a conversion between acurrent block of a video and a bitstream representation of the video,one or more temporal motion vector prediction candidates for the currentblock. The one or more temporal motion vector prediction candidatescomprise a default temporal motion vector prediction candidate. Themethod 2100 includes, at operation 2120, performing the conversion basedon the one or more temporal motion vector prediction candidates.

In some embodiments, the default temporal motion vector predictioncandidate is determined after identifying a first temporal adjacentblock of the current block based on an initial motion vector. The firsttemporal adjacent block includes invalid motion information. In someembodiments, the default temporal motion vector is inherited from aspatial neighboring block of the current block. In some embodiments, thedefault temporal motion vector is scaled. In some embodiments, thedefault temporal motion vector prediction candidate is derived based astarting point motion vector (or an initial motion vector). The startingpoint motion vector (or the initial motion vector) is either associatedwith a spatial adjacent block of the current block or a zero motionvector. In some embodiments, the starting point motion vector iscompletely determined based on motion information associated with one ormore spatial adjacent blocks of the current block. In some embodiments,the starting point motion vector is associated with a block whosecorresponding reference picture is collocated with a reference pictureof the current block. In some embodiments, the block includes a spatialadjacent block of the current block, a spatial non-adjacent block of thecurrent block, or a temporal adjacent block of the current block.

In some embodiments, in case a first spatial adjacent block selectedfrom spatial adjacent blocks of the current block according to asequential order is inter-coded and a first motion vector of the firstspatial adjacent block is directed to a collocated picture of thecurrent block, the starting motion vector is determined to be the firstmotion vector, and wherein the starting motion vector is determined tobe a zero motion vector otherwise. In some embodiments, the startingpoint motion vector is determined to be motion information of arepresented block in case motion information of the represented blockthat is identified by the starting point motion vector and a centerposition of the block is unavailable. The represented block is a blockthat covers a point corresponding to the starting point motion vector ina collocated picture. In some embodiments, the starting point motionvector is used to derive sub-block motion.

In some embodiments, the default temporal motion vector is auni-prediction candidate derived by scaling a motion vector to areference picture index within a reference picture list X, X being 0or 1. In some embodiments, the reference picture index is 0. In someembodiments, the reference picture index is a smallest reference pictureindex that corresponds to a short-term reference picture. In someembodiments, X is determined based on a slice or a picture associatedwith the current block.

In some embodiments, the default temporal motion vector is abi-prediction candidate derived by scaling a motion vector to areference picture index within a reference picture list. In someembodiments, for each reference picture in the reference picture list,the reference picture index is same as a target reference picture indexof a temporal motion vector prediction candidate. In some embodiments,whether the default temporal motion vector is uni-prediction candidateor a bi-prediction candidate is determined based on a picture type of aslice_type associated with the current block. In some embodiments,whether the default temporal motion vector is uni-prediction candidateor a bi-prediction candidate is determined based on a size of thecurrent block.

FIG. 22 is a flowchart representation of a method 2200 for videoprocessing in accordance with the present disclosure. The method 2200includes, at operation 2210, determining, for a conversion between acurrent block of a video and a bitstream representation of the video, asub-block level merge candidate list that includes at least onesub-block coding type. The method 2200 includes, at operation 2220,performing the conversion based on the sub-block level merge candidatelist.

In some embodiments, the at least one sub-block coding type comprises asub-block based temporal motion vector prediction coding type. In someembodiments, at least one sub-block coding type comprises an affinemotion prediction coding type. In some embodiments, each of at least onesub-block coding type is assigned with a range of merge indices. In someembodiments, the merge index within a first range corresponds to thesub-block based temporal motion vector prediction coding type. In someembodiments, the first range includes a single value of 0. In someembodiments, the merge index within a second range corresponds to theaffine motion prediction coding type. In some embodiments, the secondrange excludes a value of 0.

In some embodiments, a motion candidate of the sub-block based temporalmotion vector prediction coding type is always available in thesub-block level merge candidate list. In some embodiments, temporalinformation is only allowed to derive motion candidates of the sub-blockbased temporal motion vector prediction coding type. In someembodiments, the range of merge indices for a coding type is signaled ina video parameter set (VPS), a sequence parameter set (SPS), a pictureparameter set (PPS), a slice header, a tile group header, a coding treeunit (CTU), a tile, a coding unit (CU), a prediction unit (PU) or a CTUrow. In some embodiments, the range of merge indices for a coding typeis based on a width or a height of the current block.

In some embodiments, motion candidates of the at least one sub-blockcoding type are added to the sub-block level merge candidate list basedon an adaptive ordering. In some embodiments, the adaptive orderingindicates that a motion candidate of the sub-block based temporal motionvector prediction coding type is added to the sub-block level mergecandidate list prior to a motion candidate of the affine motionprediction coding type. In some embodiments, the adaptive orderingindicates a motion candidate of the sub-block based temporal motionvector prediction coding type and a motion candidate of the affinemotion prediction type are added to the sub-block level merge candidatelist in an interleaved manner. In some embodiments, the adaptiveordering is based on coded information of the current block orneighboring blocks of the current block. In some embodiments, in case amajority of the neighboring blocks of the current block is affine coded,the adaptive ordering indicates that a motion candidate of the affinemotion prediction coding type is added to the sub-block level mergecandidate list prior to motion candidates of other types. In someembodiments, the adaptive ordering is based on a ratio of affine motioncandidates to non-affine motion candidates in the sub-block level mergecandidate list. In some embodiments, in case the ratio is greater than athreshold, the adaptive ordering indicates that a motion candidate ofthe affine motion coding type is added to the sub-block level mergecandidate list prior to motion candidates of other types. In someembodiments, the adaptive ordering is applicable to first K affinemotion candidates in the sub-block level merge candidate list, K being apositive integer. In some embodiments, the adaptive ordering is signaledby in a video parameter set (VPS), a sequence parameter set (SPS), apicture parameter set (PPS), a slice header, a tile group header, acoding tree unit (CTU), a tile, a coding unit (CU), a prediction unit(PU) or a CTU row.

FIG. 23 is a flowchart representation of a method 2300 for videoprocessing in accordance with the present disclosure. The method 2300includes, at operation 2310, determining, for a conversion between acurrent block of a video and a bitstream representation of the video, asub-block level coding technique based on an indication that is signaledin a picture header, a picture parameter set (PPS), a slice header, or atile group header. The method 2300 includes, at operation 2320,performing the conversion based on the sub-block level coding technique.

In some embodiments, the sub-block level coding technique comprises asub-block based temporal motion vector prediction coding technique. Insome embodiments, the sub-block level coding technique comprises anaffine coding technique. In some embodiments, the indication indicatesthat the sub-block coding technique is disabled.

In some embodiments, the sub-block level motion derivation process andthe block level motion derivation process can be unified. FIG. 24A is aflowchart representation of a method 2400 for video processing inaccordance with the present disclosure. The method 2400 includes, atoperation 2410, determining, for a conversion between a current block ofa video and a bitstream representation of the video, a sub-block leveltemporal motion candidate using a derivation process applicable to ablock level temporal motion vector prediction candidate conversionbetween the current block and the bitstream representation. The method2400 also includes, at operation 2420, performing the conversion basedon the sub-block level temporal motion candidate. FIG. 24B is aflowchart representation of a method 2450 for video processing inaccordance with the present disclosure. The method 2450 includes, atoperation 2460, determining, for a conversion between a current block ofa video and a bitstream representation of the video, a block leveltemporal motion vector prediction candidate using a derivation processapplicable to a sub-block level temporal motion candidate conversionbetween the current block and the bitstream representation. The method2450 also includes, at operation 2360, performing the conversion basedon the block level temporal motion vector prediction candidate.

In some embodiments, the conversion in the above methods comprisesencoding the current block to generate the bitstream representation. Insome embodiments, the conversion in the above methods comprises decodingthe bitstream representation to generate the current block.

Some embodiments of the disclosed technology include making a decisionor determination to enable a video processing tool or mode. In anexample, when the video processing tool or mode is enabled, the encoderwill use or implement the tool or mode in the processing of a block ofvideo, but may not necessarily modify the resulting bitstream based onthe usage of the tool or mode. That is, a conversion from the block ofvideo to the bitstream representation of the video will use the videoprocessing tool or mode when it is enabled based on the decision ordetermination. In another example, when the video processing tool ormode is enabled, the decoder will process the bitstream with theknowledge that the bitstream has been modified based on the videoprocessing tool or mode. That is, a conversion from the bitstreamrepresentation of the video to the block of video will be performedusing the video processing tool or mode that was enabled based on thedecision or determination.

Some embodiments of the disclosed technology include making a decisionor determination to disable a video processing tool or mode. In anexample, when the video processing tool or mode is disabled, the encoderwill not use the tool or mode in the conversion of the block of video tothe bitstream representation of the video. In another example, when thevideo processing tool or mode is disabled, the decoder will process thebitstream with the knowledge that the bitstream has not been modifiedusing the video processing tool or mode that was enabled based on thedecision or determination.

From the foregoing, it will be appreciated that specific embodiments ofthe presently disclosed technology have been described herein forpurposes of illustration, but that various modifications may be madewithout deviating from the scope of the invention. Accordingly, thepresently disclosed technology is not limited except as by the appendedclaims.

Implementations of the subject matter and the functional operationsdescribed in this patent document can be implemented in various systems,digital electronic circuitry, or in computer software, firmware, orhardware, including the structures disclosed in this specification andtheir structural equivalents, or in combinations of one or more of them.Implementations of the subject matter described in this specificationcan be implemented as one or more computer program products, i.e., oneor more modules of computer program instructions encoded on a tangibleand non-transitory computer readable medium for execution by, or tocontrol the operation of, data processing apparatus. The computerreadable medium can be a machine-readable storage device, amachine-readable storage substrate, a memory device, a composition ofmatter effecting a machine-readable propagated signal, or a combinationof one or more of them. The term “data processing unit” or “dataprocessing apparatus” encompasses all apparatus, devices, and machinesfor processing data, including by way of example a programmableprocessor, a computer, or multiple processors or computers. Theapparatus can include, in addition to hardware, code that creates anexecution environment for the computer program in question, e.g., codethat constitutes processor firmware, a protocol stack, a databasemanagement system, an operating system, or a combination of one or moreof them.

A computer program (also known as a program, software, softwareapplication, script, or code) can be written in any form of programminglanguage, including compiled or interpreted languages, and it can bedeployed in any form, including as a stand-alone program or as a module,component, subroutine, or other unit suitable for use in a computingenvironment. A computer program does not necessarily correspond to afile in a file system. A program can be stored in a portion of a filethat holds other programs or data (e.g., one or more scripts stored in amarkup language document), in a single file dedicated to the program inquestion, or in multiple coordinated files (e.g., files that store oneor more modules, sub programs, or portions of code). A computer programcan be deployed to be executed on one computer or on multiple computersthat are located at one site or distributed across multiple sites andinterconnected by a communication network.

The processes and logic flows described in this specification can beperformed by one or more programmable processors executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application specific integrated circuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processors of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read only memory ora random access memory or both. The essential elements of a computer area processor for performing instructions and one or more memory devicesfor storing instructions and data. Generally, a computer will alsoinclude, or be operatively coupled to receive data from or transfer datato, or both, one or more mass storage devices for storing data, e.g.,magnetic, magneto optical disks, or optical disks. However, a computerneed not have such devices. Computer readable media suitable for storingcomputer program instructions and data include all forms of nonvolatilememory, media and memory devices, including by way of examplesemiconductor memory devices, e.g., EPROM, EEPROM, and flash memorydevices. The processor and the memory can be supplemented by, orincorporated in, special purpose logic circuitry.

It is intended that the specification, together with the drawings, beconsidered example only, where example means an example. Additionally,the use of “or” is intended to include “and/or”, unless the contextclearly indicates otherwise.

While this patent document contains many specifics, these should not beconstrued as limitations on the scope of any invention or of what may beclaimed, but rather as descriptions of features that may be specific toparticular embodiments of particular inventions. Certain features thatare described in this patent document in the context of separateembodiments can also be implemented in combination in a singleembodiment. Conversely, various features that are described in thecontext of a single embodiment can also be implemented in multipleembodiments separately or in any suitable subcombination. Moreover,although features may be described above as acting in certaincombinations and even initially claimed as such, one or more featuresfrom a claimed combination can in some cases be excised from thecombination, and the claimed combination may be directed to asubcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. Moreover, the separation of various system components in theembodiments described in this patent document should not be understoodas requiring such separation in all embodiments.

Only a few implementations and examples are described and otherimplementations, enhancements and variations can be made based on whatis described and illustrated in this patent document.

What is claimed is:
 1. A method for video processing, comprising:constructing, during a conversion between a current block of visualmedia data and a bitstream representation of the current block, a mergecandidate list, wherein the current block comprises at least onesubblock; checking one spatial neighboring block in a pre-definedrelative position compared to the current block to determine a temporalmotion vector offset to locate at least one collocated region in acollocated picture, wherein the at least one collocated region is usedto determine a temporal motion vector prediction candidate to be addedin the merge candidate list; and determining, based on the mergecandidate list, motion information for the at least one subblock; andperforming the conversion based on the motion information.
 2. The methodof claim 1, wherein the merge candidate list is a subblock mergecandidate list and the temporal motion vector prediction candidatecomprises a subblock based temporal motion vector prediction candidate.3. The method of claim 1, wherein a position relation of the currentblock with respect to the spatial neighboring block is same as aposition relation of a video block with respect to a spatial neighboringblock checked in a non-subblock merge candidate list constructionprocess of the video block.
 4. The method of claim 1, wherein thespatial neighboring block is adjacent to a bottom-left corner of thecurrent block.
 5. The method of claim 1, wherein the spatial neighboringblock is a spatial neighboring block A₁.
 6. The method of claim 5,wherein the spatial neighboring block A1 covers a luma location (xCb−1,yCb+cbHeight−1), wherein (xCb,yCb) is a luma location of the top-leftsample of the current block relative to the top-left luma sample of acurrent picture comprising the current block and cbHeight is a height ofthe current block.
 7. The method of claim 6, wherein the temporal motionvector offset is determined without checking any spatial neighboringblock other than the spatial neighboring block A1.
 8. The method ofclaim 1, wherein the spatial neighboring block is coded prior toperforming the conversion of the current block.
 9. The method of claim1, wherein the spatial neighboring block which is determined to beavailable according to the checking result is within a same tile as thecurrent block.
 10. The method of claim 2, wherein the temporal motionvector prediction candidate is used to derive motion information for theat least one sub-block of the current block.
 11. The method of claim 1,wherein a size of a sub-block of the current block is 8×8.
 12. Themethod of claim 1, wherein a size of a sub-block of the current block issame as a block size.
 13. The method of claim 1, wherein checking thespatial neighboring block comprises: determining whether the spatialneighboring block is available to determine the temporal motion vectoroffset.
 14. The method of claim 1, wherein the conversion comprisesencoding the current block into the bitstream representation.
 15. Themethod of claim 1, wherein the conversion comprises decoding thebitstream representation from the current block.
 16. An apparatus forprocessing video data comprising a processor and a non-transitory memorywith instructions thereon, wherein the instructions upon execution bythe processor, cause the processor to: construct, during a conversionbetween a current block of visual media data and a bitstreamrepresentation of the current block, a merge candidate list, wherein thecurrent block comprises at least one subblock; check one spatialneighboring block in a pre-defined relative position compared to thecurrent block to determine a temporal motion vector offset to locate atleast one collocated region in a collocated picture, wherein the atleast one collocated region is used to determine a temporal motionvector prediction candidate to be added in the merge candidate list; anddetermine, based on the merge candidate list, motion information for theat least one subblock; and perform the conversion based on the motioninformation.
 17. The apparatus of claim 16, wherein the spatialneighboring block is adjacent to a bottom-left corner of the currentblock.
 18. The apparatus of claim 16, wherein the spatial neighboringblock is a spatial neighboring block A1.
 19. A non-transitorycomputer-readable storage medium storing instructions that cause aprocessor to: construct, during a conversion between a current block ofvisual media data and a bitstream representation of the current block, amerge candidate list, wherein the current block comprises at least onesubblock; check one spatial neighboring block in a pre-defined relativeposition compared to the current block to determine a temporal motionvector offset to locate at least one collocated region in a collocatedpicture, wherein the at least one collocated region is used to determinea temporal motion vector prediction candidate to be added in the mergecandidate list; and determine, based on the merge candidate list, motioninformation for the at least one subblock; and perform the conversionbased on the motion information.
 20. A non-transitory computer-readablerecording medium storing a bitstream representation which is generatedby a method performed by a video processing apparatus, wherein themethod comprises: constructing, during a conversion between a currentblock of visual media data and a bitstream representation of the currentblock, a merge candidate list, wherein the current block comprises atleast one subblock; checking one spatial neighboring block in apre-defined relative position compared to the current block to determinea temporal motion vector offset to locate at least one collocated regionin a collocated picture, wherein the at least one collocated region isused to determine a temporal motion vector prediction candidate to beadded in the merge candidate list; and determining, based on the mergecandidate list, motion information for the at least one subblock; andperforming the conversion based on the motion information.